From nobody Fri Jun 26 13:44:01 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D77DAC433FE for ; Wed, 23 Feb 2022 05:23:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237852AbiBWFYV (ORCPT ); Wed, 23 Feb 2022 00:24:21 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56420 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233132AbiBWFYQ (ORCPT ); Wed, 23 Feb 2022 00:24:16 -0500 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 048D269488 for ; Tue, 22 Feb 2022 21:23:49 -0800 (PST) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-2d61f6c1877so163260397b3.15 for ; Tue, 22 Feb 2022 21:23:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=GiCelUYVwIstxpHcG9rjUpGG5AWgwrhv320gd9YieGc=; b=m37cSIhex1P0AZECSya53Nwlg/yln3Q85Hc/7yVpLCT4G9GGxthOi0alDY9kzflfJf jxr7dI9kLc34Zh2tgIIu7ncsA8mItJVunH8yTxcyUS0nT8UjkaLLgEbCfjeQThYH++9U X5/yPZI6EEWT1qhiFMvs4k6thovqNzSAhAgs+b0wZLFQZwXp2Q22vOzrG2juvEle09q1 lF2qYsQdKi3eJhFxiXSLgstSyfZd9tAOiBs2kmdCVwg1XZsE0rPkNGxamGQwsj1qmYfa 3N/pDdrSsLFF0KDU8Yzh8eFsdN7hYEX4OK/Dt3o9A1u0TvL2rmxVKhEMmpc6wiFqQO4Y WIOg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=GiCelUYVwIstxpHcG9rjUpGG5AWgwrhv320gd9YieGc=; b=ZXHWimUmHQnrbLiW+NsJSQXf+5HmdGOnuY3L9N2QJ4IlCRbJErEktWGCAycdKpeNaP vhztMlyn7wzxmwo+oJYrCaSxKFxrBVNGPce3hP2Qz3laMOkOtoDCPgZbbBuy3Xx2gGf0 RyFfsPmTltlFAAvw7XIQeOOEuTOtcgxJpixkcjP9Ip0BkSvpq9RrtK35fuqaPKBa9VXD DiA1Pdxdc65rpM5y/lBCDFaWkXeIl/uVpYprR7/f12jq9lG/pP5ltSiVSt81UdzRK9wh GqsIirxH6Fxs22zfhSuM1j7IN9ir0IWjg7C/mqZz8vAnIN7gkPnIVbLpKvMskOFFgdJu xvIg== X-Gm-Message-State: AOAM532ijChOAWtJQXZf0Da6ZcTZEZt1KztOMcgHIgs5THFTNTm7Oikw OTA+8vT74xW6yA7A7xx5PzncuDqd2lY9I8MiaHUVmCrhVfrXNHjjyFuv8ViDztHwLbQ05DgMOYq 0czo1jCaLVDVOvG49CixVnDffp5i/EctOjugjOzmRiAqnpfsUMluJ4zjpfuhARy1wZKdkxzKa X-Google-Smtp-Source: ABdhPJymjY3mQJBHUfMhfZ8vJdOZZMoFyZQ84NUDUQbSn/nngiTOsENeiNKk8wsCIb50BwzPk44T8MOnbWIc X-Received: from js-desktop.svl.corp.google.com ([2620:15c:2cd:202:ccbe:5d15:e2e6:322]) (user=junaids job=sendgmr) by 2002:a25:2b0a:0:b0:624:a898:3e2f with SMTP id r10-20020a252b0a000000b00624a8983e2fmr9721548ybr.643.1645593828179; Tue, 22 Feb 2022 21:23:48 -0800 (PST) Date: Tue, 22 Feb 2022 21:21:37 -0800 In-Reply-To: <20220223052223.1202152-1-junaids@google.com> Message-Id: <20220223052223.1202152-2-junaids@google.com> Mime-Version: 1.0 References: <20220223052223.1202152-1-junaids@google.com> X-Mailer: git-send-email 2.35.1.473.g83b2b277ed-goog Subject: [RFC PATCH 01/47] mm: asi: Introduce ASI core API From: Junaid Shahid To: linux-kernel@vger.kernel.org Cc: kvm@vger.kernel.org, pbonzini@redhat.com, jmattson@google.com, pjt@google.com, oweisse@google.com, alexandre.chartre@oracle.com, rppt@linux.ibm.com, dave.hansen@linux.intel.com, peterz@infradead.org, tglx@linutronix.de, luto@kernel.org, linux-mm@kvack.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Introduce core API for Address Space Isolation (ASI). Kernel address space isolation provides the ability to run some kernel code with a reduced kernel address space. There can be multiple classes of such restricted kernel address spaces (e.g. KPTI, KVM-PTI etc.). Each ASI class is identified by an index. The ASI class can register some hooks to be called when entering/exiting the restricted address space. Currently, there is a fixed maximum number of ASI classes supported. In addition, each process can have at most one restricted address space from each ASI class. Neither of these are inherent limitations and are merely simplifying assumptions for the time being. (The Kconfig and the high-level ASI API are derived from the original ASI RFC by Alexandre Chartre). Originally-by: Alexandre Chartre Signed-off-by: Junaid Shahid --- arch/alpha/include/asm/Kbuild | 1 + arch/arc/include/asm/Kbuild | 1 + arch/arm/include/asm/Kbuild | 1 + arch/arm64/include/asm/Kbuild | 1 + arch/csky/include/asm/Kbuild | 1 + arch/h8300/include/asm/Kbuild | 1 + arch/hexagon/include/asm/Kbuild | 1 + arch/ia64/include/asm/Kbuild | 1 + arch/m68k/include/asm/Kbuild | 1 + arch/microblaze/include/asm/Kbuild | 1 + arch/mips/include/asm/Kbuild | 1 + arch/nds32/include/asm/Kbuild | 1 + arch/nios2/include/asm/Kbuild | 1 + arch/openrisc/include/asm/Kbuild | 1 + arch/parisc/include/asm/Kbuild | 1 + arch/powerpc/include/asm/Kbuild | 1 + arch/riscv/include/asm/Kbuild | 1 + arch/s390/include/asm/Kbuild | 1 + arch/sh/include/asm/Kbuild | 1 + arch/sparc/include/asm/Kbuild | 1 + arch/um/include/asm/Kbuild | 1 + arch/x86/include/asm/asi.h | 81 +++++++++++++++ arch/x86/include/asm/tlbflush.h | 2 + arch/x86/mm/Makefile | 1 + arch/x86/mm/asi.c | 152 +++++++++++++++++++++++++++++ arch/x86/mm/init.c | 5 +- arch/x86/mm/tlb.c | 2 +- arch/xtensa/include/asm/Kbuild | 1 + include/asm-generic/asi.h | 51 ++++++++++ include/linux/mm_types.h | 3 + kernel/fork.c | 3 + security/Kconfig | 10 ++ 32 files changed, 329 insertions(+), 3 deletions(-) create mode 100644 arch/x86/include/asm/asi.h create mode 100644 arch/x86/mm/asi.c create mode 100644 include/asm-generic/asi.h diff --git a/arch/alpha/include/asm/Kbuild b/arch/alpha/include/asm/Kbuild index 42911c8340c7..e3cd063d9cca 100644 --- a/arch/alpha/include/asm/Kbuild +++ b/arch/alpha/include/asm/Kbuild @@ -4,3 +4,4 @@ generated-y +=3D syscall_table.h generic-y +=3D export.h generic-y +=3D kvm_para.h generic-y +=3D mcs_spinlock.h +generic-y +=3D asi.h diff --git a/arch/arc/include/asm/Kbuild b/arch/arc/include/asm/Kbuild index 3c1afa524b9c..60bdeffa7c31 100644 --- a/arch/arc/include/asm/Kbuild +++ b/arch/arc/include/asm/Kbuild @@ -4,3 +4,4 @@ generic-y +=3D kvm_para.h generic-y +=3D mcs_spinlock.h generic-y +=3D parport.h generic-y +=3D user.h +generic-y +=3D asi.h diff --git a/arch/arm/include/asm/Kbuild b/arch/arm/include/asm/Kbuild index 03657ff8fbe3..1e2c3d8dbbd9 100644 --- a/arch/arm/include/asm/Kbuild +++ b/arch/arm/include/asm/Kbuild @@ -6,3 +6,4 @@ generic-y +=3D parport.h =20 generated-y +=3D mach-types.h generated-y +=3D unistd-nr.h +generic-y +=3D asi.h diff --git a/arch/arm64/include/asm/Kbuild b/arch/arm64/include/asm/Kbuild index 64202010b700..086e94f00f94 100644 --- a/arch/arm64/include/asm/Kbuild +++ b/arch/arm64/include/asm/Kbuild @@ -4,5 +4,6 @@ generic-y +=3D mcs_spinlock.h generic-y +=3D qrwlock.h generic-y +=3D qspinlock.h generic-y +=3D user.h +generic-y +=3D asi.h =20 generated-y +=3D cpucaps.h diff --git a/arch/csky/include/asm/Kbuild b/arch/csky/include/asm/Kbuild index 904a18a818be..b4af49fa48c3 100644 --- a/arch/csky/include/asm/Kbuild +++ b/arch/csky/include/asm/Kbuild @@ -6,3 +6,4 @@ generic-y +=3D kvm_para.h generic-y +=3D qrwlock.h generic-y +=3D user.h generic-y +=3D vmlinux.lds.h +generic-y +=3D asi.h diff --git a/arch/h8300/include/asm/Kbuild b/arch/h8300/include/asm/Kbuild index e23139c8fc0d..f1e937df4c8e 100644 --- a/arch/h8300/include/asm/Kbuild +++ b/arch/h8300/include/asm/Kbuild @@ -6,3 +6,4 @@ generic-y +=3D kvm_para.h generic-y +=3D mcs_spinlock.h generic-y +=3D parport.h generic-y +=3D spinlock.h +generic-y +=3D asi.h diff --git a/arch/hexagon/include/asm/Kbuild b/arch/hexagon/include/asm/Kbu= ild index 3ece3c93fe08..744ffbeeb7ae 100644 --- a/arch/hexagon/include/asm/Kbuild +++ b/arch/hexagon/include/asm/Kbuild @@ -3,3 +3,4 @@ generic-y +=3D extable.h generic-y +=3D iomap.h generic-y +=3D kvm_para.h generic-y +=3D mcs_spinlock.h +generic-y +=3D asi.h diff --git a/arch/ia64/include/asm/Kbuild b/arch/ia64/include/asm/Kbuild index f994c1daf9d4..897a388f3e85 100644 --- a/arch/ia64/include/asm/Kbuild +++ b/arch/ia64/include/asm/Kbuild @@ -3,3 +3,4 @@ generated-y +=3D syscall_table.h generic-y +=3D kvm_para.h generic-y +=3D mcs_spinlock.h generic-y +=3D vtime.h +generic-y +=3D asi.h diff --git a/arch/m68k/include/asm/Kbuild b/arch/m68k/include/asm/Kbuild index 0dbf9c5c6fae..faf0f135df4a 100644 --- a/arch/m68k/include/asm/Kbuild +++ b/arch/m68k/include/asm/Kbuild @@ -4,3 +4,4 @@ generic-y +=3D extable.h generic-y +=3D kvm_para.h generic-y +=3D mcs_spinlock.h generic-y +=3D spinlock.h +generic-y +=3D asi.h diff --git a/arch/microblaze/include/asm/Kbuild b/arch/microblaze/include/a= sm/Kbuild index a055f5dbe00a..012e4bf83c13 100644 --- a/arch/microblaze/include/asm/Kbuild +++ b/arch/microblaze/include/asm/Kbuild @@ -8,3 +8,4 @@ generic-y +=3D parport.h generic-y +=3D syscalls.h generic-y +=3D tlb.h generic-y +=3D user.h +generic-y +=3D asi.h diff --git a/arch/mips/include/asm/Kbuild b/arch/mips/include/asm/Kbuild index dee172716581..b2c7b62536b4 100644 --- a/arch/mips/include/asm/Kbuild +++ b/arch/mips/include/asm/Kbuild @@ -14,3 +14,4 @@ generic-y +=3D parport.h generic-y +=3D qrwlock.h generic-y +=3D qspinlock.h generic-y +=3D user.h +generic-y +=3D asi.h diff --git a/arch/nds32/include/asm/Kbuild b/arch/nds32/include/asm/Kbuild index 82a4453c9c2d..e8c4cf63db79 100644 --- a/arch/nds32/include/asm/Kbuild +++ b/arch/nds32/include/asm/Kbuild @@ -6,3 +6,4 @@ generic-y +=3D gpio.h generic-y +=3D kvm_para.h generic-y +=3D parport.h generic-y +=3D user.h +generic-y +=3D asi.h diff --git a/arch/nios2/include/asm/Kbuild b/arch/nios2/include/asm/Kbuild index 7fe7437555fb..bfdc4026c5b1 100644 --- a/arch/nios2/include/asm/Kbuild +++ b/arch/nios2/include/asm/Kbuild @@ -5,3 +5,4 @@ generic-y +=3D kvm_para.h generic-y +=3D mcs_spinlock.h generic-y +=3D spinlock.h generic-y +=3D user.h +generic-y +=3D asi.h diff --git a/arch/openrisc/include/asm/Kbuild b/arch/openrisc/include/asm/K= build index ca5987e11053..3d365bec74d0 100644 --- a/arch/openrisc/include/asm/Kbuild +++ b/arch/openrisc/include/asm/Kbuild @@ -7,3 +7,4 @@ generic-y +=3D qspinlock.h generic-y +=3D qrwlock_types.h generic-y +=3D qrwlock.h generic-y +=3D user.h +generic-y +=3D asi.h diff --git a/arch/parisc/include/asm/Kbuild b/arch/parisc/include/asm/Kbuild index e6e7f74c8ac9..b14e4f727331 100644 --- a/arch/parisc/include/asm/Kbuild +++ b/arch/parisc/include/asm/Kbuild @@ -4,3 +4,4 @@ generated-y +=3D syscall_table_64.h generic-y +=3D kvm_para.h generic-y +=3D mcs_spinlock.h generic-y +=3D user.h +generic-y +=3D asi.h diff --git a/arch/powerpc/include/asm/Kbuild b/arch/powerpc/include/asm/Kbu= ild index bcf95ce0964f..2aff0fa469c4 100644 --- a/arch/powerpc/include/asm/Kbuild +++ b/arch/powerpc/include/asm/Kbuild @@ -8,3 +8,4 @@ generic-y +=3D mcs_spinlock.h generic-y +=3D qrwlock.h generic-y +=3D vtime.h generic-y +=3D early_ioremap.h +generic-y +=3D asi.h diff --git a/arch/riscv/include/asm/Kbuild b/arch/riscv/include/asm/Kbuild index 445ccc97305a..3e2022a5a6c5 100644 --- a/arch/riscv/include/asm/Kbuild +++ b/arch/riscv/include/asm/Kbuild @@ -5,3 +5,4 @@ generic-y +=3D flat.h generic-y +=3D kvm_para.h generic-y +=3D user.h generic-y +=3D vmlinux.lds.h +generic-y +=3D asi.h diff --git a/arch/s390/include/asm/Kbuild b/arch/s390/include/asm/Kbuild index 1a18d7b82f86..ef80906ed195 100644 --- a/arch/s390/include/asm/Kbuild +++ b/arch/s390/include/asm/Kbuild @@ -8,3 +8,4 @@ generic-y +=3D asm-offsets.h generic-y +=3D export.h generic-y +=3D kvm_types.h generic-y +=3D mcs_spinlock.h +generic-y +=3D asi.h diff --git a/arch/sh/include/asm/Kbuild b/arch/sh/include/asm/Kbuild index fc44d9c88b41..ea19e4515828 100644 --- a/arch/sh/include/asm/Kbuild +++ b/arch/sh/include/asm/Kbuild @@ -3,3 +3,4 @@ generated-y +=3D syscall_table.h generic-y +=3D kvm_para.h generic-y +=3D mcs_spinlock.h generic-y +=3D parport.h +generic-y +=3D asi.h diff --git a/arch/sparc/include/asm/Kbuild b/arch/sparc/include/asm/Kbuild index 0b9d98ced34a..08730a26aaed 100644 --- a/arch/sparc/include/asm/Kbuild +++ b/arch/sparc/include/asm/Kbuild @@ -4,3 +4,4 @@ generated-y +=3D syscall_table_64.h generic-y +=3D export.h generic-y +=3D kvm_para.h generic-y +=3D mcs_spinlock.h +generic-y +=3D asi.h diff --git a/arch/um/include/asm/Kbuild b/arch/um/include/asm/Kbuild index e5a7b552bb38..b62245b2445a 100644 --- a/arch/um/include/asm/Kbuild +++ b/arch/um/include/asm/Kbuild @@ -27,3 +27,4 @@ generic-y +=3D word-at-a-time.h generic-y +=3D kprobes.h generic-y +=3D mm_hooks.h generic-y +=3D vga.h +generic-y +=3D asi.h diff --git a/arch/x86/include/asm/asi.h b/arch/x86/include/asm/asi.h new file mode 100644 index 000000000000..f9fc928a555d --- /dev/null +++ b/arch/x86/include/asm/asi.h @@ -0,0 +1,81 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_X86_ASI_H +#define _ASM_X86_ASI_H + +#include + +#include +#include + +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + +#define ASI_MAX_NUM_ORDER 2 +#define ASI_MAX_NUM (1 << ASI_MAX_NUM_ORDER) + +struct asi_state { + struct asi *curr_asi; + struct asi *target_asi; +}; + +struct asi_hooks { + /* Both of these functions MUST be idempotent and re-entrant. */ + + void (*post_asi_enter)(void); + void (*pre_asi_exit)(void); +}; + +struct asi_class { + struct asi_hooks ops; + uint flags; + const char *name; +}; + +struct asi { + pgd_t *pgd; + struct asi_class *class; + struct mm_struct *mm; +}; + +DECLARE_PER_CPU_ALIGNED(struct asi_state, asi_cpu_state); + +void asi_init_mm_state(struct mm_struct *mm); + +int asi_register_class(const char *name, uint flags, + const struct asi_hooks *ops); +void asi_unregister_class(int index); + +int asi_init(struct mm_struct *mm, int asi_index); +void asi_destroy(struct asi *asi); + +void asi_enter(struct asi *asi); +void asi_exit(void); + +static inline void asi_set_target_unrestricted(void) +{ + barrier(); + this_cpu_write(asi_cpu_state.target_asi, NULL); +} + +static inline struct asi *asi_get_current(void) +{ + return this_cpu_read(asi_cpu_state.curr_asi); +} + +static inline struct asi *asi_get_target(void) +{ + return this_cpu_read(asi_cpu_state.target_asi); +} + +static inline bool is_asi_active(void) +{ + return (bool)asi_get_current(); +} + +static inline bool asi_is_target_unrestricted(void) +{ + return !asi_get_target(); +} + +#endif /* CONFIG_ADDRESS_SPACE_ISOLATION */ + +#endif diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflus= h.h index b587a9ee9cb2..3c43ad46c14a 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -259,6 +259,8 @@ static inline void arch_tlbbatch_add_mm(struct arch_tlb= flush_unmap_batch *batch, =20 extern void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch); =20 +unsigned long build_cr3(pgd_t *pgd, u16 asid); + #endif /* !MODULE */ =20 #endif /* _ASM_X86_TLBFLUSH_H */ diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile index 5864219221ca..09d5e65e47c8 100644 --- a/arch/x86/mm/Makefile +++ b/arch/x86/mm/Makefile @@ -51,6 +51,7 @@ obj-$(CONFIG_NUMA_EMU) +=3D numa_emulation.o obj-$(CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS) +=3D pkeys.o obj-$(CONFIG_RANDOMIZE_MEMORY) +=3D kaslr.o obj-$(CONFIG_PAGE_TABLE_ISOLATION) +=3D pti.o +obj-$(CONFIG_ADDRESS_SPACE_ISOLATION) +=3D asi.o =20 obj-$(CONFIG_AMD_MEM_ENCRYPT) +=3D mem_encrypt.o obj-$(CONFIG_AMD_MEM_ENCRYPT) +=3D mem_encrypt_identity.o diff --git a/arch/x86/mm/asi.c b/arch/x86/mm/asi.c new file mode 100644 index 000000000000..9928325f3787 --- /dev/null +++ b/arch/x86/mm/asi.c @@ -0,0 +1,152 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include +#include +#include + +#undef pr_fmt +#define pr_fmt(fmt) "ASI: " fmt + +static struct asi_class asi_class[ASI_MAX_NUM]; +static DEFINE_SPINLOCK(asi_class_lock); + +DEFINE_PER_CPU_ALIGNED(struct asi_state, asi_cpu_state); +EXPORT_PER_CPU_SYMBOL_GPL(asi_cpu_state); + +int asi_register_class(const char *name, uint flags, + const struct asi_hooks *ops) +{ + int i; + + VM_BUG_ON(name =3D=3D NULL); + + spin_lock(&asi_class_lock); + + for (i =3D 1; i < ASI_MAX_NUM; i++) { + if (asi_class[i].name =3D=3D NULL) { + asi_class[i].name =3D name; + asi_class[i].flags =3D flags; + if (ops !=3D NULL) + asi_class[i].ops =3D *ops; + break; + } + } + + spin_unlock(&asi_class_lock); + + if (i =3D=3D ASI_MAX_NUM) + i =3D -ENOSPC; + + return i; +} +EXPORT_SYMBOL_GPL(asi_register_class); + +void asi_unregister_class(int index) +{ + spin_lock(&asi_class_lock); + + WARN_ON(asi_class[index].name =3D=3D NULL); + memset(&asi_class[index], 0, sizeof(struct asi_class)); + + spin_unlock(&asi_class_lock); +} +EXPORT_SYMBOL_GPL(asi_unregister_class); + +int asi_init(struct mm_struct *mm, int asi_index) +{ + struct asi *asi =3D &mm->asi[asi_index]; + + /* Index 0 is reserved for special purposes. */ + WARN_ON(asi_index =3D=3D 0 || asi_index >=3D ASI_MAX_NUM); + WARN_ON(asi->pgd !=3D NULL); + + /* + * For now, we allocate 2 pages to avoid any potential problems with + * KPTI code. This won't be needed once KPTI is folded into the ASI + * framework. + */ + asi->pgd =3D (pgd_t *)__get_free_pages(GFP_PGTABLE_USER, + PGD_ALLOCATION_ORDER); + if (!asi->pgd) + return -ENOMEM; + + asi->class =3D &asi_class[asi_index]; + asi->mm =3D mm; + + return 0; +} +EXPORT_SYMBOL_GPL(asi_init); + +void asi_destroy(struct asi *asi) +{ + free_pages((ulong)asi->pgd, PGD_ALLOCATION_ORDER); + memset(asi, 0, sizeof(struct asi)); +} +EXPORT_SYMBOL_GPL(asi_destroy); + +static void __asi_enter(void) +{ + u64 asi_cr3; + struct asi *target =3D this_cpu_read(asi_cpu_state.target_asi); + + VM_BUG_ON(preemptible()); + + if (!target || target =3D=3D this_cpu_read(asi_cpu_state.curr_asi)) + return; + + VM_BUG_ON(this_cpu_read(cpu_tlbstate.loaded_mm) =3D=3D + LOADED_MM_SWITCHING); + + this_cpu_write(asi_cpu_state.curr_asi, target); + + asi_cr3 =3D build_cr3(target->pgd, + this_cpu_read(cpu_tlbstate.loaded_mm_asid)); + write_cr3(asi_cr3); + + if (target->class->ops.post_asi_enter) + target->class->ops.post_asi_enter(); +} + +void asi_enter(struct asi *asi) +{ + VM_WARN_ON_ONCE(!asi); + + this_cpu_write(asi_cpu_state.target_asi, asi); + barrier(); + + __asi_enter(); +} +EXPORT_SYMBOL_GPL(asi_enter); + +void asi_exit(void) +{ + u64 unrestricted_cr3; + struct asi *asi; + + preempt_disable(); + + VM_BUG_ON(this_cpu_read(cpu_tlbstate.loaded_mm) =3D=3D + LOADED_MM_SWITCHING); + + asi =3D this_cpu_read(asi_cpu_state.curr_asi); + + if (asi) { + if (asi->class->ops.pre_asi_exit) + asi->class->ops.pre_asi_exit(); + + unrestricted_cr3 =3D + build_cr3(this_cpu_read(cpu_tlbstate.loaded_mm)->pgd, + this_cpu_read(cpu_tlbstate.loaded_mm_asid)); + + write_cr3(unrestricted_cr3); + this_cpu_write(asi_cpu_state.curr_asi, NULL); + } + + preempt_enable(); +} +EXPORT_SYMBOL_GPL(asi_exit); + +void asi_init_mm_state(struct mm_struct *mm) +{ + memset(mm->asi, 0, sizeof(mm->asi)); +} diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c index 1895986842b9..000cbe5315f5 100644 --- a/arch/x86/mm/init.c +++ b/arch/x86/mm/init.c @@ -238,8 +238,9 @@ static void __init probe_page_size_mask(void) =20 /* By the default is everything supported: */ __default_kernel_pte_mask =3D __supported_pte_mask; - /* Except when with PTI where the kernel is mostly non-Global: */ - if (cpu_feature_enabled(X86_FEATURE_PTI)) + /* Except when with PTI or ASI where the kernel is mostly non-Global: */ + if (cpu_feature_enabled(X86_FEATURE_PTI) || + IS_ENABLED(CONFIG_ADDRESS_SPACE_ISOLATION)) __default_kernel_pte_mask &=3D ~_PAGE_GLOBAL; =20 /* Enable 1 GB linear kernel mappings if available: */ diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 59ba2968af1b..88d9298720dc 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -153,7 +153,7 @@ static inline u16 user_pcid(u16 asid) return ret; } =20 -static inline unsigned long build_cr3(pgd_t *pgd, u16 asid) +inline unsigned long build_cr3(pgd_t *pgd, u16 asid) { if (static_cpu_has(X86_FEATURE_PCID)) { return __sme_pa(pgd) | kern_pcid(asid); diff --git a/arch/xtensa/include/asm/Kbuild b/arch/xtensa/include/asm/Kbuild index 854c5e07e867..49fcdf9d83f5 100644 --- a/arch/xtensa/include/asm/Kbuild +++ b/arch/xtensa/include/asm/Kbuild @@ -7,3 +7,4 @@ generic-y +=3D param.h generic-y +=3D qrwlock.h generic-y +=3D qspinlock.h generic-y +=3D user.h +generic-y +=3D asi.h diff --git a/include/asm-generic/asi.h b/include/asm-generic/asi.h new file mode 100644 index 000000000000..e5ba51d30b90 --- /dev/null +++ b/include/asm-generic/asi.h @@ -0,0 +1,51 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef __ASM_GENERIC_ASI_H +#define __ASM_GENERIC_ASI_H + +/* ASI class flags */ +#define ASI_MAP_STANDARD_NONSENSITIVE 1 + +#ifndef CONFIG_ADDRESS_SPACE_ISOLATION + +#define ASI_MAX_NUM_ORDER 0 +#define ASI_MAX_NUM 0 + +#ifndef _ASSEMBLY_ + +struct asi_hooks {}; +struct asi {}; + +static inline +int asi_register_class(const char *name, uint flags, + const struct asi_hooks *ops) +{ + return 0; +} + +static inline void asi_unregister_class(int asi_index) { } + +static inline void asi_init_mm_state(struct mm_struct *mm) { } + +static inline int asi_init(struct mm_struct *mm, int asi_index) { return 0= ; } + +static inline void asi_destroy(struct asi *asi) { } + +static inline void asi_enter(struct asi *asi) { } + +static inline void asi_set_target_unrestricted(void) { } + +static inline bool asi_is_target_unrestricted(void) { return true; } + +static inline void asi_exit(void) { } + +static inline bool is_asi_active(void) { return false; } + +static inline struct asi *asi_get_target(void) { return NULL; } + +static inline struct asi *asi_get_current(void) { return NULL; } + +#endif /* !_ASSEMBLY_ */ + +#endif /* !CONFIG_ADDRESS_SPACE_ISOLATION */ + +#endif diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index c3a6e6209600..3de1afa57289 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -18,6 +18,7 @@ #include =20 #include +#include =20 #ifndef AT_VECTOR_SIZE_ARCH #define AT_VECTOR_SIZE_ARCH 0 @@ -495,6 +496,8 @@ struct mm_struct { atomic_t membarrier_state; #endif =20 + struct asi asi[ASI_MAX_NUM]; + /** * @mm_users: The number of users including userspace. * diff --git a/kernel/fork.c b/kernel/fork.c index 3244cc56b697..3695a32ee9bd 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -102,6 +102,7 @@ #include #include #include +#include =20 #include =20 @@ -1071,6 +1072,8 @@ static struct mm_struct *mm_init(struct mm_struct *mm= , struct task_struct *p, mm->def_flags =3D 0; } =20 + asi_init_mm_state(mm); + if (mm_alloc_pgd(mm)) goto fail_nopgd; =20 diff --git a/security/Kconfig b/security/Kconfig index 0b847f435beb..21b15ecaf2c1 100644 --- a/security/Kconfig +++ b/security/Kconfig @@ -65,6 +65,16 @@ config PAGE_TABLE_ISOLATION =20 See Documentation/x86/pti.rst for more details. =20 +config ADDRESS_SPACE_ISOLATION + bool "Allow code to run with a reduced kernel address space" + default n + depends on X86_64 && !UML + depends on !PARAVIRT + help + This feature provides the ability to run some kernel code + with a reduced kernel address space. This can be used to + mitigate some speculative execution attacks. + config SECURITY_INFINIBAND bool "Infiniband Security Hooks" depends on SECURITY && INFINIBAND --=20 2.35.1.473.g83b2b277ed-goog From nobody Fri Jun 26 13:44:01 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D3E45C433FE for ; Wed, 23 Feb 2022 05:24:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238015AbiBWFYZ (ORCPT ); Wed, 23 Feb 2022 00:24:25 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56442 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237789AbiBWFYS (ORCPT ); Wed, 23 Feb 2022 00:24:18 -0500 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 451CB692B9 for ; Tue, 22 Feb 2022 21:23:51 -0800 (PST) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-2d6d36ec646so129182997b3.23 for ; Tue, 22 Feb 2022 21:23:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=w8tfMjRpTHBn0meL0Ch0aZ7WXP9rbXxYNEO8B63hsno=; b=davxLdvboa7AnosMDU4YZaonHOXc2mWvmXUYJD5HKzin8o36/qY50HjsjugGaI/D5l Ru80WqGe/FnLjt+8HnlGtPjt4Zi6YBsphfykkNdK7xrLMzwOVzFL31vs2n/ZZ0O/HnpI LOgi7W6aQ+4Zmn4Z1cOc+jbXsnCew4XJAl85YB5PRtJ03TFKlpmUqcnmAJ1Qg9xnXjcb 4yfUNHBaaQoJYVar5RgqzCzuUxniA1Jjald3axQoDsLvTOuO9FNS+ARVJ1i5nxUMdGko 1OO6tdOVImcBTLT1zhrx+dYQEI4jcgxd4WQZmhlrRvv7dBLxB4zea6BzNV1mQ4MyCUUD fnQg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=w8tfMjRpTHBn0meL0Ch0aZ7WXP9rbXxYNEO8B63hsno=; b=ZhS1/PjQN133N7INEPcEsUN+z4J3PuTlP2vEYfcFbQnYGNX4C19hJRaTLM/pbZ5QcC 0s3dgm/YUbaDx1XUlkAR2fmuDjwK7awqLfjkghflowzSMZIB70jg95T8DCxHfflmA2cI vXRHFbqoJcDk2dQlEAM+meVmqKvXNZBoHJIrYlsIeXL2cjV8oPuy87BCTUrHCwHQmZ0H Fx7IqFIPzhsO3Erq2+G9uh729ThqwPMIe6n7qjIisdueEJn40/0W6a8kjjulxg86B4uN KLVPFsq7Gk+evaoogWKZHbVkExGADYbEEeQY9vHevIAZhMRLjJ+SsLc6lF+3AZ+TNPW6 T5BQ== X-Gm-Message-State: AOAM532QxJHPsUsHx+Lktz6jcpulfD6KUdyT9BQ21uaPeIynI8wyy1dn oCg3/5fLE80qiOzmyXYlgGP5jqBgNj6VJ5Pbe5gcfCH82E6dE2mFDFFwyaVplXck96cdgiCs7IM yQ+YjoFy27b987zHnZKPjvxdGt6wZ3ylB9AyLJgvV4YY3UzodKKUmlh7Dd7eUGbir63NIMB10 X-Google-Smtp-Source: ABdhPJxbYNkojrkaLjILQ8n9XN/i4xd/c0tZZOEcEinUXddidcFfUTlIca2QMtLlpZBc7xT9Yey+F1gwDIpI X-Received: from js-desktop.svl.corp.google.com ([2620:15c:2cd:202:ccbe:5d15:e2e6:322]) (user=junaids job=sendgmr) by 2002:a25:2551:0:b0:613:2017:b879 with SMTP id l78-20020a252551000000b006132017b879mr26593133ybl.557.1645593830476; Tue, 22 Feb 2022 21:23:50 -0800 (PST) Date: Tue, 22 Feb 2022 21:21:38 -0800 In-Reply-To: <20220223052223.1202152-1-junaids@google.com> Message-Id: <20220223052223.1202152-3-junaids@google.com> Mime-Version: 1.0 References: <20220223052223.1202152-1-junaids@google.com> X-Mailer: git-send-email 2.35.1.473.g83b2b277ed-goog Subject: [RFC PATCH 02/47] mm: asi: Add command-line parameter to enable/disable ASI From: Junaid Shahid To: linux-kernel@vger.kernel.org Cc: kvm@vger.kernel.org, pbonzini@redhat.com, jmattson@google.com, pjt@google.com, oweisse@google.com, alexandre.chartre@oracle.com, rppt@linux.ibm.com, dave.hansen@linux.intel.com, peterz@infradead.org, tglx@linutronix.de, luto@kernel.org, linux-mm@kvack.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" A parameter named "asi" is added, disabled by default. A feature flag X86_FEATURE_ASI is set if ASI is enabled. Signed-off-by: Junaid Shahid --- arch/x86/include/asm/asi.h | 17 ++++++++++---- arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/include/asm/disabled-features.h | 8 ++++++- arch/x86/mm/asi.c | 29 ++++++++++++++++++++++++ arch/x86/mm/init.c | 2 +- include/asm-generic/asi.h | 2 ++ 6 files changed, 53 insertions(+), 6 deletions(-) diff --git a/arch/x86/include/asm/asi.h b/arch/x86/include/asm/asi.h index f9fc928a555d..0a4af23ed0eb 100644 --- a/arch/x86/include/asm/asi.h +++ b/arch/x86/include/asm/asi.h @@ -6,6 +6,7 @@ =20 #include #include +#include =20 #ifdef CONFIG_ADDRESS_SPACE_ISOLATION =20 @@ -52,18 +53,24 @@ void asi_exit(void); =20 static inline void asi_set_target_unrestricted(void) { - barrier(); - this_cpu_write(asi_cpu_state.target_asi, NULL); + if (static_cpu_has(X86_FEATURE_ASI)) { + barrier(); + this_cpu_write(asi_cpu_state.target_asi, NULL); + } } =20 static inline struct asi *asi_get_current(void) { - return this_cpu_read(asi_cpu_state.curr_asi); + return static_cpu_has(X86_FEATURE_ASI) + ? this_cpu_read(asi_cpu_state.curr_asi) + : NULL; } =20 static inline struct asi *asi_get_target(void) { - return this_cpu_read(asi_cpu_state.target_asi); + return static_cpu_has(X86_FEATURE_ASI) + ? this_cpu_read(asi_cpu_state.target_asi) + : NULL; } =20 static inline bool is_asi_active(void) @@ -76,6 +83,8 @@ static inline bool asi_is_target_unrestricted(void) return !asi_get_target(); } =20 +#define static_asi_enabled() cpu_feature_enabled(X86_FEATURE_ASI) + #endif /* CONFIG_ADDRESS_SPACE_ISOLATION */ =20 #endif diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpuf= eatures.h index d5b5f2ab87a0..0b0ead3cdd48 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -295,6 +295,7 @@ #define X86_FEATURE_PER_THREAD_MBA (11*32+ 7) /* "" Per-thread Memory Band= width Allocation */ #define X86_FEATURE_SGX1 (11*32+ 8) /* "" Basic SGX */ #define X86_FEATURE_SGX2 (11*32+ 9) /* "" SGX Enclave Dynamic Memory Mana= gement (EDMM) */ +#define X86_FEATURE_ASI (11*32+10) /* Kernel Address Space Isolation */ =20 /* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */ #define X86_FEATURE_AVX_VNNI (12*32+ 4) /* AVX VNNI instructions */ diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/as= m/disabled-features.h index 8f28fafa98b3..9659cd9f867d 100644 --- a/arch/x86/include/asm/disabled-features.h +++ b/arch/x86/include/asm/disabled-features.h @@ -56,6 +56,12 @@ # define DISABLE_PTI (1 << (X86_FEATURE_PTI & 31)) #endif =20 +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION +# define DISABLE_ASI 0 +#else +# define DISABLE_ASI (1 << (X86_FEATURE_ASI & 31)) +#endif + /* Force disable because it's broken beyond repair */ #define DISABLE_ENQCMD (1 << (X86_FEATURE_ENQCMD & 31)) =20 @@ -79,7 +85,7 @@ #define DISABLED_MASK8 0 #define DISABLED_MASK9 (DISABLE_SMAP|DISABLE_SGX) #define DISABLED_MASK10 0 -#define DISABLED_MASK11 0 +#define DISABLED_MASK11 (DISABLE_ASI) #define DISABLED_MASK12 0 #define DISABLED_MASK13 0 #define DISABLED_MASK14 0 diff --git a/arch/x86/mm/asi.c b/arch/x86/mm/asi.c index 9928325f3787..d274c86f89b7 100644 --- a/arch/x86/mm/asi.c +++ b/arch/x86/mm/asi.c @@ -1,5 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 =20 +#include + #include #include #include @@ -18,6 +20,9 @@ int asi_register_class(const char *name, uint flags, { int i; =20 + if (!boot_cpu_has(X86_FEATURE_ASI)) + return 0; + VM_BUG_ON(name =3D=3D NULL); =20 spin_lock(&asi_class_lock); @@ -43,6 +48,9 @@ EXPORT_SYMBOL_GPL(asi_register_class); =20 void asi_unregister_class(int index) { + if (!boot_cpu_has(X86_FEATURE_ASI)) + return; + spin_lock(&asi_class_lock); =20 WARN_ON(asi_class[index].name =3D=3D NULL); @@ -52,10 +60,22 @@ void asi_unregister_class(int index) } EXPORT_SYMBOL_GPL(asi_unregister_class); =20 +static int __init set_asi_param(char *str) +{ + if (strcmp(str, "on") =3D=3D 0) + setup_force_cpu_cap(X86_FEATURE_ASI); + + return 0; +} +early_param("asi", set_asi_param); + int asi_init(struct mm_struct *mm, int asi_index) { struct asi *asi =3D &mm->asi[asi_index]; =20 + if (!boot_cpu_has(X86_FEATURE_ASI)) + return 0; + /* Index 0 is reserved for special purposes. */ WARN_ON(asi_index =3D=3D 0 || asi_index >=3D ASI_MAX_NUM); WARN_ON(asi->pgd !=3D NULL); @@ -79,6 +99,9 @@ EXPORT_SYMBOL_GPL(asi_init); =20 void asi_destroy(struct asi *asi) { + if (!boot_cpu_has(X86_FEATURE_ASI)) + return; + free_pages((ulong)asi->pgd, PGD_ALLOCATION_ORDER); memset(asi, 0, sizeof(struct asi)); } @@ -109,6 +132,9 @@ static void __asi_enter(void) =20 void asi_enter(struct asi *asi) { + if (!static_cpu_has(X86_FEATURE_ASI)) + return; + VM_WARN_ON_ONCE(!asi); =20 this_cpu_write(asi_cpu_state.target_asi, asi); @@ -123,6 +149,9 @@ void asi_exit(void) u64 unrestricted_cr3; struct asi *asi; =20 + if (!static_cpu_has(X86_FEATURE_ASI)) + return; + preempt_disable(); =20 VM_BUG_ON(this_cpu_read(cpu_tlbstate.loaded_mm) =3D=3D diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c index 000cbe5315f5..dfff17363365 100644 --- a/arch/x86/mm/init.c +++ b/arch/x86/mm/init.c @@ -240,7 +240,7 @@ static void __init probe_page_size_mask(void) __default_kernel_pte_mask =3D __supported_pte_mask; /* Except when with PTI or ASI where the kernel is mostly non-Global: */ if (cpu_feature_enabled(X86_FEATURE_PTI) || - IS_ENABLED(CONFIG_ADDRESS_SPACE_ISOLATION)) + cpu_feature_enabled(X86_FEATURE_ASI)) __default_kernel_pte_mask &=3D ~_PAGE_GLOBAL; =20 /* Enable 1 GB linear kernel mappings if available: */ diff --git a/include/asm-generic/asi.h b/include/asm-generic/asi.h index e5ba51d30b90..dae1403ee1d0 100644 --- a/include/asm-generic/asi.h +++ b/include/asm-generic/asi.h @@ -44,6 +44,8 @@ static inline struct asi *asi_get_target(void) { return N= ULL; } =20 static inline struct asi *asi_get_current(void) { return NULL; } =20 +#define static_asi_enabled() false + #endif /* !_ASSEMBLY_ */ =20 #endif /* !CONFIG_ADDRESS_SPACE_ISOLATION */ --=20 2.35.1.473.g83b2b277ed-goog From nobody Fri Jun 26 13:44:01 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0465AC433FE for ; Wed, 23 Feb 2022 05:24:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238131AbiBWFYa (ORCPT ); Wed, 23 Feb 2022 00:24:30 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56484 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237834AbiBWFYU (ORCPT ); Wed, 23 Feb 2022 00:24:20 -0500 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 60354694AE for ; Tue, 22 Feb 2022 21:23:53 -0800 (PST) Received: by mail-yb1-xb4a.google.com with SMTP id o5-20020a25d705000000b0062499d760easo8076106ybg.7 for ; Tue, 22 Feb 2022 21:23:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=o+K8zCWcV59JiWXD4aWine0/FDtELjiPI/RbdPNAHQc=; b=gL62Qc22CZ6FeqYe4RTaen0CJKVsVLhWP6XuIpy8wtIhquKE7Hq2aAgjM0cnFrMx// B9jyIFrBZPnR4ZblulS8U7l5+BX78tipmOGn+uNc7HwBgNSyCtAgwmuxuLupc9yf33fw sG7BRSGiBqlxsiI4E0nPILyGbfSDlu+DOPd9cNUxoIqlK1SF5LShFZNMYNQ5wWc1UUUq JRgPAlM6n7K8F/7k32Qk1plHhKLU8qRnhntMmXKxbAFtecCJ3vVWG8s4ryjAHjKMNW6w HoGY4yajLvjwNYPSxXLVJpFkRwFJsLHr1WAWSZWkzlHrdQQ5OWrVIF2pqwZ0e1qUQPaz mArQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=o+K8zCWcV59JiWXD4aWine0/FDtELjiPI/RbdPNAHQc=; b=qB79eIGhGkIiwG/GqSgomgsUKyuG1p8+DUfbxDAH7adv/R4SMjFSPVxjDRzMsxb5XS SwdnXvA4VoecpgZNsjujMkyIBYM/yunBTsfweWNNGazbkvpxf0b5Cut1BZu5+YXAIYGu Ju/sIhRwUZVdu8Au4hWofoERAyUm4zkd8/2FQhhD/A0vlrLFV1FrxefhED4nsontyMzk gwL8mdpjBG4G7XmX9uwzI5rXt1AXEJJYp1rkSVJSLR2unh+u8OL+4HsCSqwzGkH5SrkA +RfarES+h3idZid3wKOmH0D3MEqlIW9gy88X4GfXlIMQ4OikG4mjN1abjEW1GAzideaW VJHw== X-Gm-Message-State: AOAM530qL3SONp6ezmLt09BaADB0bs+jj8jmmI8fxHOz68fy+pnMSAHl iRDSOonVGbILmJd0TubN3BOMO13u6DLmQlW7fnuFcmjVKo+36QvQJW+GsnoEevA8qPDXdNpenGr 3/lClCtdG3TxLmIrX/juqdbZ5+m4x8eXxgcNyhQx6Raxh4R2HRJu7cwntmgBXvhOGuMImOGOO X-Google-Smtp-Source: ABdhPJyhZCUKb1E/Ab+htpmdItbzaBqk6IfSNwdQ2cMPET/cQ9m/cxN2+Idkkun2NTapLkyPe3v/k6Dz4k/E X-Received: from js-desktop.svl.corp.google.com ([2620:15c:2cd:202:ccbe:5d15:e2e6:322]) (user=junaids job=sendgmr) by 2002:a25:a486:0:b0:61d:a523:acd0 with SMTP id g6-20020a25a486000000b0061da523acd0mr25432547ybi.203.1645593832574; Tue, 22 Feb 2022 21:23:52 -0800 (PST) Date: Tue, 22 Feb 2022 21:21:39 -0800 In-Reply-To: <20220223052223.1202152-1-junaids@google.com> Message-Id: <20220223052223.1202152-4-junaids@google.com> Mime-Version: 1.0 References: <20220223052223.1202152-1-junaids@google.com> X-Mailer: git-send-email 2.35.1.473.g83b2b277ed-goog Subject: [RFC PATCH 03/47] mm: asi: Switch to unrestricted address space when entering scheduler From: Junaid Shahid To: linux-kernel@vger.kernel.org Cc: kvm@vger.kernel.org, pbonzini@redhat.com, jmattson@google.com, pjt@google.com, oweisse@google.com, alexandre.chartre@oracle.com, rppt@linux.ibm.com, dave.hansen@linux.intel.com, peterz@infradead.org, tglx@linutronix.de, luto@kernel.org, linux-mm@kvack.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" To keep things simpler, we run the scheduler only in the full unrestricted address space for the time being. Signed-off-by: Junaid Shahid --- kernel/sched/core.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 77563109c0ea..44ea197c16ea 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -19,6 +19,7 @@ =20 #include #include +#include =20 #include "../workqueue_internal.h" #include "../../fs/io-wq.h" @@ -6141,6 +6142,10 @@ static void __sched notrace __schedule(unsigned int = sched_mode) rq =3D cpu_rq(cpu); prev =3D rq->curr; =20 + /* This could possibly be delayed to just before the context switch. */ + VM_WARN_ON(!asi_is_target_unrestricted()); + asi_exit(); + schedule_debug(prev, !!sched_mode); =20 if (sched_feat(HRTICK) || sched_feat(HRTICK_DL)) --=20 2.35.1.473.g83b2b277ed-goog From nobody Fri Jun 26 13:44:01 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1BCF9C433EF for ; Wed, 23 Feb 2022 05:24:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238150AbiBWFYc (ORCPT ); Wed, 23 Feb 2022 00:24:32 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56512 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237954AbiBWFYW (ORCPT ); Wed, 23 Feb 2022 00:24:22 -0500 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 597A7694B1 for ; Tue, 22 Feb 2022 21:23:55 -0800 (PST) Received: by mail-yb1-xb49.google.com with SMTP id x1-20020a25a001000000b0061c64ee0196so26706840ybh.9 for ; Tue, 22 Feb 2022 21:23:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=kBbjpsiAT8t80DB/Hae/sSQvzOy4duVi7ttnmtZT7iU=; b=cvWzSg8yZgLUgWZKUIjW5TSN+BW3VTrtVOKpGEnHmCq9LFApxaz2ug4Y6Y4TweNu1W f9E1LiZV7paz48nIFQ58vfAnH1K6o0bvbWTXyZRNy2eoNt9EF9GseVfZcFrpOMpaAZsv YfeUwK9fKdGDUIC+bPsLXVrJ/dI3zlbx7hzpBmIG2MmUKT7no3dFhQFnuWlscR6nzDpw D94zRcBAw6obzNOLADy5DNW08Hz9FTyO1IWbrJxXtBCcNGZOBsCLR5Y7BX69Ut6ugHh9 /ICUNuePn6v0tNGb7si9lPoIGDB/04KZ4aNfI9KEiCrUUlX4abSfJJXZ09+2AJGxuoWO 1yZA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=kBbjpsiAT8t80DB/Hae/sSQvzOy4duVi7ttnmtZT7iU=; b=sgtaVo2O35rn+RnYqQwtFViQcppC0gYpQ6bAzqPNTYevFshnfxWxPYv5Afx1bJdkiS ddv7Jv+7jvvesZYHUCK2N0gX9ZzGDvTHGOAjBUk0ajxY3MnLqLtq8KoFG9MXl7Afa0xn rQwRcenxy0UliQ7mG7Hfv561Wc0QGEVpEJdbmba3GbS55Zgw3PE7pMp26LnmFotuIJiq K1F1K73btt68VrlxplkeMYd+L7c6pUPvsiH48wYtZF+hzR94ptDp3HHWuyzqHoQKb/8s 17e/yZwgMrCkwBRg2WX6QDrcHUqLd3cceqvE5s8D+k6/JrObn4yfsxmVbAXwopsuTAOI F6kw== X-Gm-Message-State: AOAM530g7LoJ/v0F6YriCAgKsTUbc350M3Rzykq6rFUu6vbakzxGdiQE 4zVg4qB67BgGKiXSu07r0tV+d4DTZnmQUpTsfTTDM1LBsDhS6fO/dqNCqfbB3jXxhjRYKGufLbo h+Rb2onH5BqKhfjamg+WeEQms5NXSxTULgZoADX8aiFZ4RsLqwvuzGcWBRLxEnBcHSFgRnSkW X-Google-Smtp-Source: ABdhPJyBLtJwVHx/eQ2H1TXtGc5CWguV1DYKie3D2boh+nhOh7RhoskWr4B/mahF4dAK4TfkpcUJJeAOMAIF X-Received: from js-desktop.svl.corp.google.com ([2620:15c:2cd:202:ccbe:5d15:e2e6:322]) (user=junaids job=sendgmr) by 2002:a05:6902:1ca:b0:624:e2a1:2856 with SMTP id u10-20020a05690201ca00b00624e2a12856mr4238491ybh.389.1645593834551; Tue, 22 Feb 2022 21:23:54 -0800 (PST) Date: Tue, 22 Feb 2022 21:21:40 -0800 In-Reply-To: <20220223052223.1202152-1-junaids@google.com> Message-Id: <20220223052223.1202152-5-junaids@google.com> Mime-Version: 1.0 References: <20220223052223.1202152-1-junaids@google.com> X-Mailer: git-send-email 2.35.1.473.g83b2b277ed-goog Subject: [RFC PATCH 04/47] mm: asi: ASI support in interrupts/exceptions From: Junaid Shahid To: linux-kernel@vger.kernel.org Cc: kvm@vger.kernel.org, pbonzini@redhat.com, jmattson@google.com, pjt@google.com, oweisse@google.com, alexandre.chartre@oracle.com, rppt@linux.ibm.com, dave.hansen@linux.intel.com, peterz@infradead.org, tglx@linutronix.de, luto@kernel.org, linux-mm@kvack.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add support for potentially switching address spaces from within interrupts/exceptions/NMIs etc. An interrupt does not automatically switch to the unrestricted address space. It can switch if needed to access some memory not available in the restricted address space, using the normal asi_exit call. On return from the outermost interrupt, if the target address space was the restricted address space (e.g. we were in the critical code path between ASI Enter and VM Enter), the restricted address space will be automatically restored. Otherwise, execution will continue in the unrestricted address space until the next explicit ASI Enter. In order to keep track of when to restore the restricted address space, an interrupt/exception nesting depth counter is maintained per-task. An alternative implementation without needing this counter is also possible, but the counter unlocks an additional nice-to-have benefit by allowing detection of whether or not we are currently executing inside an exception context, which would be useful in a later patch. Signed-off-by: Junaid Shahid --- arch/x86/include/asm/asi.h | 35 ++++++++++++++++++++++++++++++++ arch/x86/include/asm/idtentry.h | 25 +++++++++++++++++++++-- arch/x86/include/asm/processor.h | 5 +++++ arch/x86/kernel/process.c | 2 ++ arch/x86/kernel/traps.c | 2 ++ arch/x86/mm/asi.c | 3 ++- kernel/entry/common.c | 6 ++++++ 7 files changed, 75 insertions(+), 3 deletions(-) diff --git a/arch/x86/include/asm/asi.h b/arch/x86/include/asm/asi.h index 0a4af23ed0eb..7702332c62e8 100644 --- a/arch/x86/include/asm/asi.h +++ b/arch/x86/include/asm/asi.h @@ -4,6 +4,8 @@ =20 #include =20 +#include + #include #include #include @@ -51,6 +53,11 @@ void asi_destroy(struct asi *asi); void asi_enter(struct asi *asi); void asi_exit(void); =20 +static inline void asi_init_thread_state(struct thread_struct *thread) +{ + thread->intr_nest_depth =3D 0; +} + static inline void asi_set_target_unrestricted(void) { if (static_cpu_has(X86_FEATURE_ASI)) { @@ -85,6 +92,34 @@ static inline bool asi_is_target_unrestricted(void) =20 #define static_asi_enabled() cpu_feature_enabled(X86_FEATURE_ASI) =20 +static inline void asi_intr_enter(void) +{ + if (static_cpu_has(X86_FEATURE_ASI)) { + current->thread.intr_nest_depth++; + barrier(); + } +} + +static inline void asi_intr_exit(void) +{ + void __asi_enter(void); + + if (static_cpu_has(X86_FEATURE_ASI)) { + barrier(); + + if (--current->thread.intr_nest_depth =3D=3D 0) + __asi_enter(); + } +} + +#else /* CONFIG_ADDRESS_SPACE_ISOLATION */ + +static inline void asi_intr_enter(void) { } + +static inline void asi_intr_exit(void) { } + +static inline void asi_init_thread_state(struct thread_struct *thread) { } + #endif /* CONFIG_ADDRESS_SPACE_ISOLATION */ =20 #endif diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentr= y.h index 1345088e9902..ea5cdc90403d 100644 --- a/arch/x86/include/asm/idtentry.h +++ b/arch/x86/include/asm/idtentry.h @@ -10,6 +10,7 @@ #include =20 #include +#include =20 /** * DECLARE_IDTENTRY - Declare functions for simple IDT entry points @@ -133,7 +134,16 @@ static __always_inline void __##func(struct pt_regs *r= egs, \ * is required before the enter/exit() helpers are invoked. */ #define DEFINE_IDTENTRY_RAW(func) \ -__visible noinstr void func(struct pt_regs *regs) +static __always_inline void __##func(struct pt_regs *regs); \ + \ +__visible noinstr void func(struct pt_regs *regs) \ +{ \ + asi_intr_enter(); \ + __##func (regs); \ + asi_intr_exit(); \ +} \ + \ +static __always_inline void __##func(struct pt_regs *regs) =20 /** * DECLARE_IDTENTRY_RAW_ERRORCODE - Declare functions for raw IDT entry po= ints @@ -161,7 +171,18 @@ __visible noinstr void func(struct pt_regs *regs) * is required before the enter/exit() helpers are invoked. */ #define DEFINE_IDTENTRY_RAW_ERRORCODE(func) \ -__visible noinstr void func(struct pt_regs *regs, unsigned long error_code) +static __always_inline void __##func(struct pt_regs *regs, \ + unsigned long error_code); \ + \ +__visible noinstr void func(struct pt_regs *regs, unsigned long error_code= )\ +{ \ + asi_intr_enter(); \ + __##func (regs, error_code); \ + asi_intr_exit(); \ +} \ + \ +static __always_inline void __##func(struct pt_regs *regs, \ + unsigned long error_code) =20 /** * DECLARE_IDTENTRY_IRQ - Declare functions for device interrupt IDT entry diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/proces= sor.h index 355d38c0cf60..20116efd2756 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -519,6 +519,11 @@ struct thread_struct { unsigned int iopl_warn:1; unsigned int sig_on_uaccess_err:1; =20 +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + /* The nesting depth of exceptions/interrupts */ + int intr_nest_depth; +#endif + /* * Protection Keys Register for Userspace. Loaded immediately on * context switch. Store it in thread_struct to avoid a lookup in diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index 04143a653a8a..c8d4a00a4de7 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -90,6 +90,8 @@ int arch_dup_task_struct(struct task_struct *dst, struct = task_struct *src) #ifdef CONFIG_VM86 dst->thread.vm86 =3D NULL; #endif + asi_init_thread_state(&dst->thread); + /* Drop the copied pointer to current's fpstate */ dst->thread.fpu.fpstate =3D NULL; =20 diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c index c9d566dcf89a..acf675ddda96 100644 --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -61,6 +61,7 @@ #include #include #include +#include =20 #ifdef CONFIG_X86_64 #include @@ -413,6 +414,7 @@ DEFINE_IDTENTRY_DF(exc_double_fault) } #endif =20 + asi_exit(); irqentry_nmi_enter(regs); instrumentation_begin(); notify_die(DIE_TRAP, str, regs, error_code, X86_TRAP_DF, SIGSEGV); diff --git a/arch/x86/mm/asi.c b/arch/x86/mm/asi.c index d274c86f89b7..2453124f221d 100644 --- a/arch/x86/mm/asi.c +++ b/arch/x86/mm/asi.c @@ -107,12 +107,13 @@ void asi_destroy(struct asi *asi) } EXPORT_SYMBOL_GPL(asi_destroy); =20 -static void __asi_enter(void) +void __asi_enter(void) { u64 asi_cr3; struct asi *target =3D this_cpu_read(asi_cpu_state.target_asi); =20 VM_BUG_ON(preemptible()); + VM_BUG_ON(current->thread.intr_nest_depth !=3D 0); =20 if (!target || target =3D=3D this_cpu_read(asi_cpu_state.curr_asi)) return; diff --git a/kernel/entry/common.c b/kernel/entry/common.c index d5a61d565ad5..9064253085c7 100644 --- a/kernel/entry/common.c +++ b/kernel/entry/common.c @@ -9,6 +9,8 @@ =20 #include "common.h" =20 +#include + #define CREATE_TRACE_POINTS #include =20 @@ -321,6 +323,8 @@ noinstr irqentry_state_t irqentry_enter(struct pt_regs = *regs) .exit_rcu =3D false, }; =20 + asi_intr_enter(); + if (user_mode(regs)) { irqentry_enter_from_user_mode(regs); return ret; @@ -416,6 +420,7 @@ noinstr void irqentry_exit(struct pt_regs *regs, irqent= ry_state_t state) instrumentation_end(); rcu_irq_exit(); lockdep_hardirqs_on(CALLER_ADDR0); + asi_intr_exit(); return; } =20 @@ -438,6 +443,7 @@ noinstr void irqentry_exit(struct pt_regs *regs, irqent= ry_state_t state) if (state.exit_rcu) rcu_irq_exit(); } + asi_intr_exit(); } =20 irqentry_state_t noinstr irqentry_nmi_enter(struct pt_regs *regs) --=20 2.35.1.473.g83b2b277ed-goog From nobody Fri Jun 26 13:44:01 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 83FA1C433F5 for ; Wed, 23 Feb 2022 05:24:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238168AbiBWFYl (ORCPT ); Wed, 23 Feb 2022 00:24:41 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56598 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237992AbiBWFYY (ORCPT ); Wed, 23 Feb 2022 00:24:24 -0500 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id ABF6A69CFE for ; Tue, 22 Feb 2022 21:23:57 -0800 (PST) Received: by mail-yb1-xb49.google.com with SMTP id b12-20020a056902030c00b0061d720e274aso26585917ybs.20 for ; Tue, 22 Feb 2022 21:23:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=LI1Qxw8YovJHKPYFPYoCf4datmDsbayp3z46DL8Kp8Q=; b=U7xDPHt7nJyB2GTAgVc+5mVNa5D9AiEIkeTSJpYyuShkSe5G+Xo37Cmu4z2S0ODbd3 Wm5Eeq8cvy6TNvAv3A7bWuNv++i7NTg2KMLHM/m4eHjfRoQqwVJ2y/Yi5MDF7xrDbsQI j5wRAvkCT4cYk3Bc/Ve5H0pU1FD7r6VuCExamGfvXcE67Mlg9W1l1RFlwTf48DP0g9j8 zrQbvmaWMqoIe6J4oo0xoCcquiVzfC7OAHWIqYk7V7cZwcAdfqwpYM8UaM/ZPk1iYlJM o8gS8/KkwFxTjJeXBM9VLMqxRp+ccKcT56QtTvv+HXupkPEETHBW4Z+e7zRt00MQFKUT ixgA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=LI1Qxw8YovJHKPYFPYoCf4datmDsbayp3z46DL8Kp8Q=; b=MYd/DEBU0YO7vo0i+L0cMWECEq1OYVjT0ex6drnvqUgfKVVz5OufSMngCnRq+hiIyC kMr2JzcvbPiJdSHwrFb7FTuCKMJKUUHea6fVanekTLhGk4WgZ4pi+hNYgPHloAaNIze5 s8a5mz8MWPgayUuZu56vAzsFkP4lOfqyWMfLEoSNpMhLD65xzAxtTw2rI2ZTCNsGZi5H 8qDx6LPazxnbqqoswHlGA7RnwhaCXTF1kOGE93trVYm++TXrKPT7Qc6KH5BZjQo6+NBF i9wVgKbUp/9O80C26xG3SWaOBZb9zmZ5mma+jAFfiHxTM/JXLHxIJpuBfULZR+/5qeSG ILgQ== X-Gm-Message-State: AOAM532S/NYfm+3MKn4iL4tX6Y09xnfdobV3VA9M6lAi1K4Kz58eGCNn zxel16wvmund7HOfEEcVIObXgSEHWtgutxFDo7AzqvoZkQvf6qWo24qRel39eSsysg4GJr369X9 QtASa+Gyc0gylMAK8apiEHBt9Esd8tDki/ku2eQ1tlMyo4ryseufQOn8LklFzNIPm9Z67PLKN X-Google-Smtp-Source: ABdhPJxfznelYhWgkiOo62fJh4kZWSbldwDVUL5AZ4s809e7D3FsyXFvg8XSNCQZyLZ5Vxg+VGGrfUoGnpy5 X-Received: from js-desktop.svl.corp.google.com ([2620:15c:2cd:202:ccbe:5d15:e2e6:322]) (user=junaids job=sendgmr) by 2002:a81:5cc3:0:b0:2d0:a2d0:9c0e with SMTP id q186-20020a815cc3000000b002d0a2d09c0emr27666033ywb.270.1645593836836; Tue, 22 Feb 2022 21:23:56 -0800 (PST) Date: Tue, 22 Feb 2022 21:21:41 -0800 In-Reply-To: <20220223052223.1202152-1-junaids@google.com> Message-Id: <20220223052223.1202152-6-junaids@google.com> Mime-Version: 1.0 References: <20220223052223.1202152-1-junaids@google.com> X-Mailer: git-send-email 2.35.1.473.g83b2b277ed-goog Subject: [RFC PATCH 05/47] mm: asi: Make __get_current_cr3_fast() ASI-aware From: Junaid Shahid To: linux-kernel@vger.kernel.org Cc: kvm@vger.kernel.org, pbonzini@redhat.com, jmattson@google.com, pjt@google.com, oweisse@google.com, alexandre.chartre@oracle.com, rppt@linux.ibm.com, dave.hansen@linux.intel.com, peterz@infradead.org, tglx@linutronix.de, luto@kernel.org, linux-mm@kvack.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" When ASI is active, __get_current_cr3_fast() adjusts the returned CR3 value accordingly to reflect the actual ASI CR3. Signed-off-by: Junaid Shahid --- arch/x86/include/asm/asi.h | 7 +++++++ arch/x86/mm/tlb.c | 20 ++++++++++++++++++-- 2 files changed, 25 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/asi.h b/arch/x86/include/asm/asi.h index 7702332c62e8..95557211dabd 100644 --- a/arch/x86/include/asm/asi.h +++ b/arch/x86/include/asm/asi.h @@ -112,6 +112,11 @@ static inline void asi_intr_exit(void) } } =20 +static inline pgd_t *asi_pgd(struct asi *asi) +{ + return asi->pgd; +} + #else /* CONFIG_ADDRESS_SPACE_ISOLATION */ =20 static inline void asi_intr_enter(void) { } @@ -120,6 +125,8 @@ static inline void asi_intr_exit(void) { } =20 static inline void asi_init_thread_state(struct thread_struct *thread) { } =20 +static inline pgd_t *asi_pgd(struct asi *asi) { return NULL; } + #endif /* CONFIG_ADDRESS_SPACE_ISOLATION */ =20 #endif diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 88d9298720dc..25bee959d1d3 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -17,6 +17,7 @@ #include #include #include +#include =20 #include "mm_internal.h" =20 @@ -1073,12 +1074,27 @@ void flush_tlb_kernel_range(unsigned long start, un= signed long end) */ unsigned long __get_current_cr3_fast(void) { - unsigned long cr3 =3D build_cr3(this_cpu_read(cpu_tlbstate.loaded_mm)->pg= d, - this_cpu_read(cpu_tlbstate.loaded_mm_asid)); + unsigned long cr3; + pgd_t *pgd; + u16 asid =3D this_cpu_read(cpu_tlbstate.loaded_mm_asid); + struct asi *asi =3D asi_get_current(); + + if (asi) + pgd =3D asi_pgd(asi); + else + pgd =3D this_cpu_read(cpu_tlbstate.loaded_mm)->pgd; + + cr3 =3D build_cr3(pgd, asid); =20 /* For now, be very restrictive about when this can be called. */ VM_WARN_ON(in_nmi() || preemptible()); =20 + /* + * CR3 is unstable if the target ASI is unrestricted + * and a restricted ASI is currently loaded. + */ + VM_WARN_ON_ONCE(asi && asi_is_target_unrestricted()); + VM_BUG_ON(cr3 !=3D __read_cr3()); return cr3; } --=20 2.35.1.473.g83b2b277ed-goog From nobody Fri Jun 26 13:44:01 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 73DD2C4332F for ; Wed, 23 Feb 2022 05:24:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238153AbiBWFYg (ORCPT ); Wed, 23 Feb 2022 00:24:36 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56722 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238123AbiBWFY2 (ORCPT ); Wed, 23 Feb 2022 00:24:28 -0500 Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com [IPv6:2607:f8b0:4864:20::114a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CE3416A047 for ; Tue, 22 Feb 2022 21:23:59 -0800 (PST) Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-2d306e372e5so163250677b3.5 for ; Tue, 22 Feb 2022 21:23:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=PIzleJhvx1e4in5SU2cz59acEkt7I4gZXxP/VePrhcs=; b=UOhpXDHCnae+DRnauRxZtdjiTln+VoGZhoV1LC2HwZ0wE6hCMEDx6ZsM5m4zhmzU8C zHAJJxzPwGGULan8LLoI7C3OSmGzlkgZIkeoNkliXHKxR2zxRtRuDwE2nmiSS1c4BXRL MiuUyWSM6+TUYE5RqqQekigHUWwoMR7es7zfFE2KzZ5ooe/XoTE9TQgA48klkJO1uSey E656AcQf7rLEbAA6uzuZiKaqjksL8iwXpjTOMvdQBp3iyrK4xwvnjKTRQ5lhnl0vyEeu pioGnqi8o2SeqNRitqaZjawpEPOB4cWAR7yNk2BDoyC4/ldrIIsdHyj3n/nSEdlK5M0a gypQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=PIzleJhvx1e4in5SU2cz59acEkt7I4gZXxP/VePrhcs=; b=Ipj/5KE1k0ES7yurFMTzfPS58abKskOdWc/E+l4HHvqsBBsTK84cWq2bnbXX4SWf1i UlZBmO0hL2qEv++npNlMW+aaz1VL7nwzmZbUhNaK9SoiF7Ei05604FVRinuxGpNn0p7j 4ZKitdEPKLzRr/vE5g+3s6x6u52ciN+H5HVBG8olpVqthm5AdTPXXwrs9u5zP+EXXcLe WuN29kfYlWT1t54xbxgoMgVFqGXzlyLCJjS/DecH0JKxPEwK1V0nwGoUjHXlIIhAhIMw +ZyjfO9oHksrDyYP4A+sIM5z221LmmhsxdedgVWcmVuHRqQj6VO7SEB8tNTafThFfQTu gAiA== X-Gm-Message-State: AOAM530O4HsYO4s97+Z9snFTcsil8+sxpxL/fV3iTuYa9vk0hFuX3cZH TzDUEh++Tkwbloqprj9DR6YN0UmZlEEn9shiY/t924P3BTexLTrn8PEv2dpPRrMlw1unAQzSPvC Jnhjr3J9e78uVU24jsQm1qKSEZce+u7OFQeT/m1qY4E77v3z8gz/fAOiZODXbcVXlrWUtyWWL X-Google-Smtp-Source: ABdhPJyZtnc6SgOpidUjkYpln4RLDN4W6lG8iW37Mn66M+p2QX3sw1OB0/Qqkeh8xQlAuLXN36L1X25C92Rt X-Received: from js-desktop.svl.corp.google.com ([2620:15c:2cd:202:ccbe:5d15:e2e6:322]) (user=junaids job=sendgmr) by 2002:a0d:c607:0:b0:2ca:287c:6b6c with SMTP id i7-20020a0dc607000000b002ca287c6b6cmr28060793ywd.17.1645593839000; Tue, 22 Feb 2022 21:23:59 -0800 (PST) Date: Tue, 22 Feb 2022 21:21:42 -0800 In-Reply-To: <20220223052223.1202152-1-junaids@google.com> Message-Id: <20220223052223.1202152-7-junaids@google.com> Mime-Version: 1.0 References: <20220223052223.1202152-1-junaids@google.com> X-Mailer: git-send-email 2.35.1.473.g83b2b277ed-goog Subject: [RFC PATCH 06/47] mm: asi: ASI page table allocation and free functions From: Junaid Shahid To: linux-kernel@vger.kernel.org Cc: kvm@vger.kernel.org, pbonzini@redhat.com, jmattson@google.com, pjt@google.com, oweisse@google.com, alexandre.chartre@oracle.com, rppt@linux.ibm.com, dave.hansen@linux.intel.com, peterz@infradead.org, tglx@linutronix.de, luto@kernel.org, linux-mm@kvack.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This adds custom allocation and free functions for ASI page tables. The alloc functions support allocating memory using different GFP reclaim flags, in order to be able to support non-sensitive allocations from both standard and atomic contexts. They also install the page tables locklessly, which makes it slightly simpler to handle non-sensitive allocations from interrupts/exceptions. The free functions recursively free the page tables when the ASI instance is being torn down. Signed-off-by: Junaid Shahid --- arch/x86/mm/asi.c | 109 +++++++++++++++++++++++++++++++++++++++- include/linux/pgtable.h | 3 ++ 2 files changed, 111 insertions(+), 1 deletion(-) diff --git a/arch/x86/mm/asi.c b/arch/x86/mm/asi.c index 2453124f221d..40d772b2e2a8 100644 --- a/arch/x86/mm/asi.c +++ b/arch/x86/mm/asi.c @@ -60,6 +60,113 @@ void asi_unregister_class(int index) } EXPORT_SYMBOL_GPL(asi_unregister_class); =20 +#ifndef mm_inc_nr_p4ds +#define mm_inc_nr_p4ds(mm) do {} while (false) +#endif + +#ifndef mm_dec_nr_p4ds +#define mm_dec_nr_p4ds(mm) do {} while (false) +#endif + +#define pte_offset pte_offset_kernel + +#define DEFINE_ASI_PGTBL_ALLOC(base, level) \ +static level##_t * asi_##level##_alloc(struct asi *asi, \ + base##_t *base, ulong addr, \ + gfp_t flags) \ +{ \ + if (unlikely(base##_none(*base))) { \ + ulong pgtbl =3D get_zeroed_page(flags); \ + phys_addr_t pgtbl_pa; \ + \ + if (pgtbl =3D=3D 0) \ + return NULL; \ + \ + pgtbl_pa =3D __pa(pgtbl); \ + paravirt_alloc_##level(asi->mm, PHYS_PFN(pgtbl_pa)); \ + \ + if (cmpxchg((ulong *)base, 0, \ + pgtbl_pa | _PAGE_TABLE) =3D=3D 0) { \ + mm_inc_nr_##level##s(asi->mm); \ + } else { \ + paravirt_release_##level(PHYS_PFN(pgtbl_pa)); \ + free_page(pgtbl); \ + } \ + \ + /* NOP on native. PV call on Xen. */ \ + set_##base(base, *base); \ + } \ + VM_BUG_ON(base##_large(*base)); \ + return level##_offset(base, addr); \ +} + +DEFINE_ASI_PGTBL_ALLOC(pgd, p4d) +DEFINE_ASI_PGTBL_ALLOC(p4d, pud) +DEFINE_ASI_PGTBL_ALLOC(pud, pmd) +DEFINE_ASI_PGTBL_ALLOC(pmd, pte) + +#define asi_free_dummy(asi, addr) +#define __pmd_free(mm, pmd) free_page((ulong)(pmd)) +#define pud_page_vaddr(pud) ((ulong)pud_pgtable(pud)) +#define p4d_page_vaddr(p4d) ((ulong)p4d_pgtable(p4d)) + +static inline unsigned long pte_page_vaddr(pte_t pte) +{ + return (unsigned long)__va(pte_val(pte) & PTE_PFN_MASK); +} + +#define DEFINE_ASI_PGTBL_FREE(level, LEVEL, next, free) \ +static void asi_free_##level(struct asi *asi, ulong pgtbl_addr) \ +{ \ + uint i; \ + level##_t *level =3D (level##_t *)pgtbl_addr; \ + \ + for (i =3D 0; i < PTRS_PER_##LEVEL; i++) { \ + ulong vaddr; \ + \ + if (level##_none(level[i])) \ + continue; \ + \ + vaddr =3D level##_page_vaddr(level[i]); \ + \ + if (!level##_leaf(level[i])) \ + asi_free_##next(asi, vaddr); \ + else \ + VM_WARN(true, "Lingering mapping in ASI %p at %lx",\ + asi, vaddr); \ + } \ + paravirt_release_##level(PHYS_PFN(__pa(pgtbl_addr))); \ + free(asi->mm, level); \ + mm_dec_nr_##level##s(asi->mm); \ +} + +DEFINE_ASI_PGTBL_FREE(pte, PTE, dummy, pte_free_kernel) +DEFINE_ASI_PGTBL_FREE(pmd, PMD, pte, __pmd_free) +DEFINE_ASI_PGTBL_FREE(pud, PUD, pmd, pud_free) +DEFINE_ASI_PGTBL_FREE(p4d, P4D, pud, p4d_free) + +static void asi_free_pgd_range(struct asi *asi, uint start, uint end) +{ + uint i; + + for (i =3D start; i < end; i++) + if (pgd_present(asi->pgd[i])) + asi_free_p4d(asi, (ulong)p4d_offset(asi->pgd + i, 0)); +} + +/* + * Free the page tables allocated for the given ASI instance. + * The caller must ensure that all the mappings have already been cleared + * and appropriate TLB flushes have been issued before calling this functi= on. + */ +static void asi_free_pgd(struct asi *asi) +{ + VM_BUG_ON(asi->mm =3D=3D &init_mm); + + asi_free_pgd_range(asi, KERNEL_PGD_BOUNDARY, PTRS_PER_PGD); + free_pages((ulong)asi->pgd, PGD_ALLOCATION_ORDER); +} + static int __init set_asi_param(char *str) { if (strcmp(str, "on") =3D=3D 0) @@ -102,7 +209,7 @@ void asi_destroy(struct asi *asi) if (!boot_cpu_has(X86_FEATURE_ASI)) return; =20 - free_pages((ulong)asi->pgd, PGD_ALLOCATION_ORDER); + asi_free_pgd(asi); memset(asi, 0, sizeof(struct asi)); } EXPORT_SYMBOL_GPL(asi_destroy); diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index e24d2c992b11..2fff17a939f0 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -1593,6 +1593,9 @@ typedef unsigned int pgtbl_mod_mask; #ifndef pmd_leaf #define pmd_leaf(x) 0 #endif +#ifndef pte_leaf +#define pte_leaf(x) 1 +#endif =20 #ifndef pgd_leaf_size #define pgd_leaf_size(x) (1ULL << PGDIR_SHIFT) --=20 2.35.1.473.g83b2b277ed-goog From nobody Fri Jun 26 13:44:01 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A1B9CC433F5 for ; Wed, 23 Feb 2022 05:24:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238164AbiBWFYi (ORCPT ); Wed, 23 Feb 2022 00:24:38 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56740 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238134AbiBWFYb (ORCPT ); Wed, 23 Feb 2022 00:24:31 -0500 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 281776A03A for ; Tue, 22 Feb 2022 21:24:01 -0800 (PST) Received: by mail-yb1-xb4a.google.com with SMTP id r14-20020a5b018e000000b00624f6f97bf4so306331ybl.12 for ; Tue, 22 Feb 2022 21:24:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=4OZ+JPmRPuew/8tnwetStHaUs2Nh17IXa7mxxiAMwj8=; b=kb4u7VjrnfodFdsXodDcDOsogb6ps0VFuoU2Ch/2lXixZ/HfftZw3kdktGLZNqJaaS sS3He2L5FPw70IJb8AKJo1NDhJWYKpPBXq/OhYV1okDxsl0mvsUmyEav/HUJeog/Gj9y Q5Dc8pVb07blL49n409GgYRDrcKVQVQRVFNARfm5uMsXWp3Yd02cr7kkTrGmuAR1Xu1E sxdm0LFpj5VhSie+6tVTejQjNAf6nTCGVLsZ8GtzBSgzjYfKnMkFSPkXEgjwcV5k4Zwo pwZnNuNgZRjJww5tQ8PUPFXrFLOBqqQa1ttWIBXg8Sv/mjjUaM/rS2VlzHPfgRo69Xqa n3Jg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=4OZ+JPmRPuew/8tnwetStHaUs2Nh17IXa7mxxiAMwj8=; b=mjX82iQqchgPbCci/bK9CiospSmNe82b7JRe3rb03GtbKNmIQlSeo0qlirWbRRJcmY joAaKScWO3DrxUM23Dr3PbinaG/G9ku31KqDB3bE4ew9UCCwar57HuTDr00lv+K0Vwls //2micSrQBKebvRvVWXa9iwNT1Pb+vA8EZgg2JS4mzhH+0TgCl7tjAa31DHiSu1+RbVu HLyHt6esgwwTFH59gGO2lOOjuUM2aX6Yi5GUu+ICKqaTu3DsD57l45EtHyKNQ+3Y1NLT 23DPnYE30astfi9rlisrb1EIXdfWZtwydLIcNcWtX0Irwp7afdr8OlIE8iQ1mthG7Rwg nLYA== X-Gm-Message-State: AOAM530JUhBN1hXwxfNuEamv5GSS5ohEY0a9r+4ACwr1qaXv8NsNSVut iNQjMGDedmVw0SH7J8/zhY0Rv7NE7UUfXOJsNWsph3fmK2S4snWQwdSqR72eoj/j5dDim6rllXK sbdOXHlMmXhNgNPmiKSy/3yejfnYh8CJaJdg/3ZTuhshsiBH4yjstDm64Yn0n37+9VnzrM3XV X-Google-Smtp-Source: ABdhPJznarXNTwMlwRo1R3Ew9ob69h2spGBZggFbqvzN1k2HjtUx3qHkoNOb8UlGcjxUj887oMmqSttakWi9 X-Received: from js-desktop.svl.corp.google.com ([2620:15c:2cd:202:ccbe:5d15:e2e6:322]) (user=junaids job=sendgmr) by 2002:a0d:eb09:0:b0:2d1:e0df:5104 with SMTP id u9-20020a0deb09000000b002d1e0df5104mr27667696ywe.250.1645593841036; Tue, 22 Feb 2022 21:24:01 -0800 (PST) Date: Tue, 22 Feb 2022 21:21:43 -0800 In-Reply-To: <20220223052223.1202152-1-junaids@google.com> Message-Id: <20220223052223.1202152-8-junaids@google.com> Mime-Version: 1.0 References: <20220223052223.1202152-1-junaids@google.com> X-Mailer: git-send-email 2.35.1.473.g83b2b277ed-goog Subject: [RFC PATCH 07/47] mm: asi: Functions to map/unmap a memory range into ASI page tables From: Junaid Shahid To: linux-kernel@vger.kernel.org Cc: kvm@vger.kernel.org, pbonzini@redhat.com, jmattson@google.com, pjt@google.com, oweisse@google.com, alexandre.chartre@oracle.com, rppt@linux.ibm.com, dave.hansen@linux.intel.com, peterz@infradead.org, tglx@linutronix.de, luto@kernel.org, linux-mm@kvack.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Two functions, asi_map() and asi_map_gfp(), are added to allow mapping memory into ASI page tables. The mapping will be identical to the one for the same virtual address in the unrestricted page tables. This is necessary to allow switching between the page tables at any arbitrary point in the kernel. Another function, asi_unmap() is added to allow unmapping memory mapped via asi_map* Signed-off-by: Junaid Shahid --- arch/x86/include/asm/asi.h | 5 + arch/x86/mm/asi.c | 196 +++++++++++++++++++++++++++++++++++++ include/asm-generic/asi.h | 19 ++++ mm/internal.h | 3 + mm/vmalloc.c | 60 +++++++----- 5 files changed, 261 insertions(+), 22 deletions(-) diff --git a/arch/x86/include/asm/asi.h b/arch/x86/include/asm/asi.h index 95557211dabd..521b40d1864b 100644 --- a/arch/x86/include/asm/asi.h +++ b/arch/x86/include/asm/asi.h @@ -53,6 +53,11 @@ void asi_destroy(struct asi *asi); void asi_enter(struct asi *asi); void asi_exit(void); =20 +int asi_map_gfp(struct asi *asi, void *addr, size_t len, gfp_t gfp_flags); +int asi_map(struct asi *asi, void *addr, size_t len); +void asi_unmap(struct asi *asi, void *addr, size_t len, bool flush_tlb); +void asi_flush_tlb_range(struct asi *asi, void *addr, size_t len); + static inline void asi_init_thread_state(struct thread_struct *thread) { thread->intr_nest_depth =3D 0; diff --git a/arch/x86/mm/asi.c b/arch/x86/mm/asi.c index 40d772b2e2a8..84d220cbdcfc 100644 --- a/arch/x86/mm/asi.c +++ b/arch/x86/mm/asi.c @@ -6,6 +6,8 @@ #include #include =20 +#include "../../../mm/internal.h" + #undef pr_fmt #define pr_fmt(fmt) "ASI: " fmt =20 @@ -287,3 +289,197 @@ void asi_init_mm_state(struct mm_struct *mm) { memset(mm->asi, 0, sizeof(mm->asi)); } + +static bool is_page_within_range(size_t addr, size_t page_size, + size_t range_start, size_t range_end) +{ + size_t page_start, page_end, page_mask; + + page_mask =3D ~(page_size - 1); + page_start =3D addr & page_mask; + page_end =3D page_start + page_size; + + return page_start >=3D range_start && page_end <=3D range_end; +} + +static bool follow_physaddr(struct mm_struct *mm, size_t virt, + phys_addr_t *phys, size_t *page_size, ulong *flags) +{ + pgd_t *pgd; + p4d_t *p4d; + pud_t *pud; + pmd_t *pmd; + pte_t *pte; + +#define follow_addr_at_level(base, level, LEVEL) \ + do { \ + *page_size =3D LEVEL##_SIZE; \ + level =3D level##_offset(base, virt); \ + if (!level##_present(*level)) \ + return false; \ + \ + if (level##_large(*level)) { \ + *phys =3D PFN_PHYS(level##_pfn(*level)) | \ + (virt & ~LEVEL##_MASK); \ + *flags =3D level##_flags(*level); \ + return true; \ + } \ + } while (false) + + follow_addr_at_level(mm, pgd, PGDIR); + follow_addr_at_level(pgd, p4d, P4D); + follow_addr_at_level(p4d, pud, PUD); + follow_addr_at_level(pud, pmd, PMD); + + *page_size =3D PAGE_SIZE; + pte =3D pte_offset_map(pmd, virt); + if (!pte) + return false; + + if (!pte_present(*pte)) { + pte_unmap(pte); + return false; + } + + *phys =3D PFN_PHYS(pte_pfn(*pte)) | (virt & ~PAGE_MASK); + *flags =3D pte_flags(*pte); + + pte_unmap(pte); + return true; + +#undef follow_addr_at_level +} + +/* + * Map the given range into the ASI page tables. The source of the mapping + * is the regular unrestricted page tables. + * Can be used to map any kernel memory. + * + * The caller MUST ensure that the source mapping will not change during t= his + * function. For dynamic kernel memory, this is generally ensured by mappi= ng + * the memory within the allocator. + * + * If the source mapping is a large page and the range being mapped spans = the + * entire large page, then it will be mapped as a large page in the ASI pa= ge + * tables too. If the range does not span the entire huge page, then it wi= ll + * be mapped as smaller pages. In that case, the implementation is slightly + * inefficient, as it will walk the source page tables again for each small + * destination page, but that should be ok for now, as usually in such cas= es, + * the range would consist of a small-ish number of pages. + */ +int asi_map_gfp(struct asi *asi, void *addr, size_t len, gfp_t gfp_flags) +{ + size_t virt; + size_t start =3D (size_t)addr; + size_t end =3D start + len; + size_t page_size; + + if (!static_cpu_has(X86_FEATURE_ASI)) + return 0; + + VM_BUG_ON(start & ~PAGE_MASK); + VM_BUG_ON(len & ~PAGE_MASK); + VM_BUG_ON(start < TASK_SIZE_MAX); + + gfp_flags &=3D GFP_RECLAIM_MASK; + + if (asi->mm !=3D &init_mm) + gfp_flags |=3D __GFP_ACCOUNT; + + for (virt =3D start; virt < end; virt =3D ALIGN(virt + 1, page_size)) { + pgd_t *pgd; + p4d_t *p4d; + pud_t *pud; + pmd_t *pmd; + pte_t *pte; + phys_addr_t phys; + ulong flags; + + if (!follow_physaddr(asi->mm, virt, &phys, &page_size, &flags)) + continue; + +#define MAP_AT_LEVEL(base, BASE, level, LEVEL) { \ + if (base##_large(*base)) { \ + VM_BUG_ON(PHYS_PFN(phys & BASE##_MASK) !=3D \ + base##_pfn(*base)); \ + continue; \ + } \ + \ + level =3D asi_##level##_alloc(asi, base, virt, gfp_flags);\ + if (!level) \ + return -ENOMEM; \ + \ + if (page_size >=3D LEVEL##_SIZE && \ + (level##_none(*level) || level##_leaf(*level)) && \ + is_page_within_range(virt, LEVEL##_SIZE, \ + start, end)) { \ + page_size =3D LEVEL##_SIZE; \ + phys &=3D LEVEL##_MASK; \ + \ + if (level##_none(*level)) \ + set_##level(level, \ + __##level(phys | flags)); \ + else \ + VM_BUG_ON(level##_pfn(*level) !=3D \ + PHYS_PFN(phys)); \ + continue; \ + } \ + } + + pgd =3D pgd_offset_pgd(asi->pgd, virt); + + MAP_AT_LEVEL(pgd, PGDIR, p4d, P4D); + MAP_AT_LEVEL(p4d, P4D, pud, PUD); + MAP_AT_LEVEL(pud, PUD, pmd, PMD); + MAP_AT_LEVEL(pmd, PMD, pte, PAGE); + + VM_BUG_ON(true); /* Should never reach here. */ +#undef MAP_AT_LEVEL + } + + return 0; +} + +int asi_map(struct asi *asi, void *addr, size_t len) +{ + return asi_map_gfp(asi, addr, len, GFP_KERNEL); +} + +/* + * Unmap a kernel address range previously mapped into the ASI page tables. + * The caller must ensure appropriate TLB flushing. + * + * The area being unmapped must be a whole previously mapped region (or re= gions) + * Unmapping a partial subset of a previously mapped region is not support= ed. + * That will work, but may end up unmapping more than what was asked for, = if + * the mapping contained huge pages. + * + * Note that higher order direct map allocations are allowed to be partial= ly + * freed. If it turns out that that actually happens for any of the + * non-sensitive allocations, then the above limitation may be a problem. = For + * now, vunmap_pgd_range() will emit a warning if this situation is detect= ed. + */ +void asi_unmap(struct asi *asi, void *addr, size_t len, bool flush_tlb) +{ + size_t start =3D (size_t)addr; + size_t end =3D start + len; + pgtbl_mod_mask mask =3D 0; + + if (!static_cpu_has(X86_FEATURE_ASI) || !len) + return; + + VM_BUG_ON(start & ~PAGE_MASK); + VM_BUG_ON(len & ~PAGE_MASK); + VM_BUG_ON(start < TASK_SIZE_MAX); + + vunmap_pgd_range(asi->pgd, start, end, &mask, false); + + if (flush_tlb) + asi_flush_tlb_range(asi, addr, len); +} + +void asi_flush_tlb_range(struct asi *asi, void *addr, size_t len) +{ + /* Later patches will do a more optimized flush. */ + flush_tlb_kernel_range((ulong)addr, (ulong)addr + len); +} diff --git a/include/asm-generic/asi.h b/include/asm-generic/asi.h index dae1403ee1d0..7da91cbe075d 100644 --- a/include/asm-generic/asi.h +++ b/include/asm-generic/asi.h @@ -2,6 +2,8 @@ #ifndef __ASM_GENERIC_ASI_H #define __ASM_GENERIC_ASI_H =20 +#include + /* ASI class flags */ #define ASI_MAP_STANDARD_NONSENSITIVE 1 =20 @@ -44,6 +46,23 @@ static inline struct asi *asi_get_target(void) { return = NULL; } =20 static inline struct asi *asi_get_current(void) { return NULL; } =20 +static inline +int asi_map_gfp(struct asi *asi, void *addr, size_t len, gfp_t gfp_flags) +{ + return 0; +} + +static inline int asi_map(struct asi *asi, void *addr, size_t len) +{ + return 0; +} + +static inline +void asi_unmap(struct asi *asi, void *addr, size_t len, bool flush_tlb) { } + +static inline +void asi_flush_tlb_range(struct asi *asi, void *addr, size_t len) { } + #define static_asi_enabled() false =20 #endif /* !_ASSEMBLY_ */ diff --git a/mm/internal.h b/mm/internal.h index 3b79a5c9427a..ae8799d86dd3 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -79,6 +79,9 @@ void unmap_page_range(struct mmu_gather *tlb, unsigned long addr, unsigned long end, struct zap_details *details); =20 +void vunmap_pgd_range(pgd_t *pgd_table, unsigned long addr, unsigned long = end, + pgtbl_mod_mask *mask, bool sleepable); + void do_page_cache_ra(struct readahead_control *, unsigned long nr_to_read, unsigned long lookahead_size); void force_page_cache_ra(struct readahead_control *, unsigned long nr); diff --git a/mm/vmalloc.c b/mm/vmalloc.c index d2a00ad4e1dd..f2ef719f1cba 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -336,7 +336,7 @@ static void vunmap_pte_range(pmd_t *pmd, unsigned long = addr, unsigned long end, } =20 static void vunmap_pmd_range(pud_t *pud, unsigned long addr, unsigned long= end, - pgtbl_mod_mask *mask) + pgtbl_mod_mask *mask, bool sleepable) { pmd_t *pmd; unsigned long next; @@ -350,18 +350,22 @@ static void vunmap_pmd_range(pud_t *pud, unsigned lon= g addr, unsigned long end, if (cleared || pmd_bad(*pmd)) *mask |=3D PGTBL_PMD_MODIFIED; =20 - if (cleared) + if (cleared) { + WARN_ON(addr & ~PMD_MASK); + WARN_ON(next & ~PMD_MASK); continue; + } if (pmd_none_or_clear_bad(pmd)) continue; vunmap_pte_range(pmd, addr, next, mask); =20 - cond_resched(); + if (sleepable) + cond_resched(); } while (pmd++, addr =3D next, addr !=3D end); } =20 static void vunmap_pud_range(p4d_t *p4d, unsigned long addr, unsigned long= end, - pgtbl_mod_mask *mask) + pgtbl_mod_mask *mask, bool sleepable) { pud_t *pud; unsigned long next; @@ -375,16 +379,19 @@ static void vunmap_pud_range(p4d_t *p4d, unsigned lon= g addr, unsigned long end, if (cleared || pud_bad(*pud)) *mask |=3D PGTBL_PUD_MODIFIED; =20 - if (cleared) + if (cleared) { + WARN_ON(addr & ~PUD_MASK); + WARN_ON(next & ~PUD_MASK); continue; + } if (pud_none_or_clear_bad(pud)) continue; - vunmap_pmd_range(pud, addr, next, mask); + vunmap_pmd_range(pud, addr, next, mask, sleepable); } while (pud++, addr =3D next, addr !=3D end); } =20 static void vunmap_p4d_range(pgd_t *pgd, unsigned long addr, unsigned long= end, - pgtbl_mod_mask *mask) + pgtbl_mod_mask *mask, bool sleepable) { p4d_t *p4d; unsigned long next; @@ -398,14 +405,35 @@ static void vunmap_p4d_range(pgd_t *pgd, unsigned lon= g addr, unsigned long end, if (cleared || p4d_bad(*p4d)) *mask |=3D PGTBL_P4D_MODIFIED; =20 - if (cleared) + if (cleared) { + WARN_ON(addr & ~P4D_MASK); + WARN_ON(next & ~P4D_MASK); continue; + } if (p4d_none_or_clear_bad(p4d)) continue; - vunmap_pud_range(p4d, addr, next, mask); + vunmap_pud_range(p4d, addr, next, mask, sleepable); } while (p4d++, addr =3D next, addr !=3D end); } =20 +void vunmap_pgd_range(pgd_t *pgd_table, unsigned long addr, unsigned long = end, + pgtbl_mod_mask *mask, bool sleepable) +{ + unsigned long next; + pgd_t *pgd =3D pgd_offset_pgd(pgd_table, addr); + + BUG_ON(addr >=3D end); + + do { + next =3D pgd_addr_end(addr, end); + if (pgd_bad(*pgd)) + *mask |=3D PGTBL_PGD_MODIFIED; + if (pgd_none_or_clear_bad(pgd)) + continue; + vunmap_p4d_range(pgd, addr, next, mask, sleepable); + } while (pgd++, addr =3D next, addr !=3D end); +} + /* * vunmap_range_noflush is similar to vunmap_range, but does not * flush caches or TLBs. @@ -420,21 +448,9 @@ static void vunmap_p4d_range(pgd_t *pgd, unsigned long= addr, unsigned long end, */ void vunmap_range_noflush(unsigned long start, unsigned long end) { - unsigned long next; - pgd_t *pgd; - unsigned long addr =3D start; pgtbl_mod_mask mask =3D 0; =20 - BUG_ON(addr >=3D end); - pgd =3D pgd_offset_k(addr); - do { - next =3D pgd_addr_end(addr, end); - if (pgd_bad(*pgd)) - mask |=3D PGTBL_PGD_MODIFIED; - if (pgd_none_or_clear_bad(pgd)) - continue; - vunmap_p4d_range(pgd, addr, next, &mask); - } while (pgd++, addr =3D next, addr !=3D end); + vunmap_pgd_range(init_mm.pgd, start, end, &mask, true); =20 if (mask & ARCH_PAGE_TABLE_SYNC_MASK) arch_sync_kernel_mappings(start, end); --=20 2.35.1.473.g83b2b277ed-goog From nobody Fri Jun 26 13:44:01 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9ABEDC433F5 for ; Wed, 23 Feb 2022 05:24:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238242AbiBWFY6 (ORCPT ); Wed, 23 Feb 2022 00:24:58 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57634 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237992AbiBWFYx (ORCPT ); Wed, 23 Feb 2022 00:24:53 -0500 Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com [IPv6:2607:f8b0:4864:20::114a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4212F6A036 for ; Tue, 22 Feb 2022 21:24:04 -0800 (PST) Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-2d5258e8a4bso162217137b3.22 for ; Tue, 22 Feb 2022 21:24:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=rf+PEwSCAjx96OEampym4lZ49QLmyoHK2vvPo63tKDU=; b=ClDNkRcR05xwUIt+/3suGkjgzNs7jx8crmYYSBy7YIfyMwXpzxpsuO7L5+YX8vxRfh 1E4PSbB3uEBG+XkPHeMZghVckls3s0XgENlVkTaPWZjkwqbjhFcBMXRc0pNft0u4lgb5 OHkuB86fnETxs+Q/fn9lSwDJhVOhXOzfA1+wbVB6gzJyDg9tXEAFaT4CQ0u+HBjQvZbK xaLThn55HGR54AO8ID3Zi3c8s3jxcmEzd75ic5aZDYq6ErsSaIKBY/8EVr+KDDNXwZv5 bVHGd/T3R6cR1QrhXBnqEawoY6M01XywGeiIoQ1z1l9A03h+cXgMumLSGriK6mCj3H15 pxlQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=rf+PEwSCAjx96OEampym4lZ49QLmyoHK2vvPo63tKDU=; b=Vcy8E+DX60rioAhbKpo0rPMrSLcD6X2gk0iSSLFdhJmju8ELl5qPX9szm/eA7XmcvK NhsqcIoeTEWdm7oQWWPRvGaiI8zV/JGjnUJqcPGhck01sw/wqIwhLlPLnBhxD83rL0oq ANXxh2reQfs9yL9dpghDH6v00D5gIiDOoKy1tdN8yxZySUBJ8AtMHEDBSATZs3WuKyzA 7AISDPejaJh9mbA7rcMo/6OWPazpAt3l9i9szRWnuUzwhGnb+EnXKiW4Sj/nRNs6KAB2 sACjRT/uFl5JdFapXHy8mN2D4CE5fC/+2KR0r/YFWk5gGE5siDrvPPdWIQ55/v2eXCaP 4eDQ== X-Gm-Message-State: AOAM533n5xDdpOsTiBF6ylqMI/en7DP/nx5W/4lDoNyrXoG3G+03tKAR uKcEPVqHRMpNLLO6b3o7kTt5vDdcIAF84jjCdiPUBk4qILivQd/NaJ7zzGj6KoszMPyfKPnI/Mc O+AnCFnmbjS+uqaHuDi0ml9VaGJOyBN7SpBQ8WrJwvGNGs+/NUPv/GqO+n/b3I+k6G821l4D0 X-Google-Smtp-Source: ABdhPJxAE8giXqGIDJh7Mviar4JsaaTJZtQQuDBTZF65UTbW8rTNqxt0+PJdM/jBytNMKHkW6V66FJSXIQ6n X-Received: from js-desktop.svl.corp.google.com ([2620:15c:2cd:202:ccbe:5d15:e2e6:322]) (user=junaids job=sendgmr) by 2002:a05:6902:108:b0:621:165e:5c1e with SMTP id o8-20020a056902010800b00621165e5c1emr25436069ybh.204.1645593843385; Tue, 22 Feb 2022 21:24:03 -0800 (PST) Date: Tue, 22 Feb 2022 21:21:44 -0800 In-Reply-To: <20220223052223.1202152-1-junaids@google.com> Message-Id: <20220223052223.1202152-9-junaids@google.com> Mime-Version: 1.0 References: <20220223052223.1202152-1-junaids@google.com> X-Mailer: git-send-email 2.35.1.473.g83b2b277ed-goog Subject: [RFC PATCH 08/47] mm: asi: Add basic infrastructure for global non-sensitive mappings From: Junaid Shahid To: linux-kernel@vger.kernel.org Cc: kvm@vger.kernel.org, pbonzini@redhat.com, jmattson@google.com, pjt@google.com, oweisse@google.com, alexandre.chartre@oracle.com, rppt@linux.ibm.com, dave.hansen@linux.intel.com, peterz@infradead.org, tglx@linutronix.de, luto@kernel.org, linux-mm@kvack.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" A pseudo-PGD is added to store global non-sensitive ASI mappings. Actual ASI PGDs copy entries from this pseudo-PGD during asi_init(). Memory can be mapped as globally non-sensitive by calling asi_map() with ASI_GLOBAL_NONSENSITIVE. Page tables allocated for global non-sensitive mappings are never freed. Signed-off-by: Junaid Shahid --- arch/x86/include/asm/asi.h | 12 ++++++++++++ arch/x86/mm/asi.c | 36 +++++++++++++++++++++++++++++++++++- arch/x86/mm/init_64.c | 26 +++++++++++++++++--------- arch/x86/mm/mm_internal.h | 3 +++ include/asm-generic/asi.h | 5 +++++ mm/init-mm.c | 2 ++ 6 files changed, 74 insertions(+), 10 deletions(-) diff --git a/arch/x86/include/asm/asi.h b/arch/x86/include/asm/asi.h index 521b40d1864b..64c2b4d1dba2 100644 --- a/arch/x86/include/asm/asi.h +++ b/arch/x86/include/asm/asi.h @@ -15,6 +15,8 @@ #define ASI_MAX_NUM_ORDER 2 #define ASI_MAX_NUM (1 << ASI_MAX_NUM_ORDER) =20 +#define ASI_GLOBAL_NONSENSITIVE (&init_mm.asi[0]) + struct asi_state { struct asi *curr_asi; struct asi *target_asi; @@ -41,6 +43,8 @@ struct asi { =20 DECLARE_PER_CPU_ALIGNED(struct asi_state, asi_cpu_state); =20 +extern pgd_t asi_global_nonsensitive_pgd[]; + void asi_init_mm_state(struct mm_struct *mm); =20 int asi_register_class(const char *name, uint flags, @@ -117,6 +121,14 @@ static inline void asi_intr_exit(void) } } =20 +#define INIT_MM_ASI(init_mm) \ + .asi =3D { \ + [0] =3D { \ + .pgd =3D asi_global_nonsensitive_pgd, \ + .mm =3D &init_mm \ + } \ + }, + static inline pgd_t *asi_pgd(struct asi *asi) { return asi->pgd; diff --git a/arch/x86/mm/asi.c b/arch/x86/mm/asi.c index 84d220cbdcfc..d381ae573af9 100644 --- a/arch/x86/mm/asi.c +++ b/arch/x86/mm/asi.c @@ -1,11 +1,13 @@ // SPDX-License-Identifier: GPL-2.0 =20 #include +#include =20 #include #include #include =20 +#include "mm_internal.h" #include "../../../mm/internal.h" =20 #undef pr_fmt @@ -17,6 +19,8 @@ static DEFINE_SPINLOCK(asi_class_lock); DEFINE_PER_CPU_ALIGNED(struct asi_state, asi_cpu_state); EXPORT_PER_CPU_SYMBOL_GPL(asi_cpu_state); =20 +__aligned(PAGE_SIZE) pgd_t asi_global_nonsensitive_pgd[PTRS_PER_PGD]; + int asi_register_class(const char *name, uint flags, const struct asi_hooks *ops) { @@ -160,12 +164,17 @@ static void asi_free_pgd_range(struct asi *asi, uint = start, uint end) * Free the page tables allocated for the given ASI instance. * The caller must ensure that all the mappings have already been cleared * and appropriate TLB flushes have been issued before calling this functi= on. + * + * For standard non-sensitive ASI classes, the page tables shared with the + * master pseudo-PGD are not freed. */ static void asi_free_pgd(struct asi *asi) { VM_BUG_ON(asi->mm =3D=3D &init_mm); =20 - asi_free_pgd_range(asi, KERNEL_PGD_BOUNDARY, PTRS_PER_PGD); + if (!(asi->class->flags & ASI_MAP_STANDARD_NONSENSITIVE)) + asi_free_pgd_range(asi, KERNEL_PGD_BOUNDARY, PTRS_PER_PGD); + free_pages((ulong)asi->pgd, PGD_ALLOCATION_ORDER); } =20 @@ -178,6 +187,24 @@ static int __init set_asi_param(char *str) } early_param("asi", set_asi_param); =20 +static int __init asi_global_init(void) +{ + if (!boot_cpu_has(X86_FEATURE_ASI)) + return 0; + + preallocate_toplevel_pgtbls(asi_global_nonsensitive_pgd, + PAGE_OFFSET, + PAGE_OFFSET + PFN_PHYS(max_possible_pfn) - 1, + "ASI Global Non-sensitive direct map"); + + preallocate_toplevel_pgtbls(asi_global_nonsensitive_pgd, + VMALLOC_START, VMALLOC_END, + "ASI Global Non-sensitive vmalloc"); + + return 0; +} +subsys_initcall(asi_global_init) + int asi_init(struct mm_struct *mm, int asi_index) { struct asi *asi =3D &mm->asi[asi_index]; @@ -202,6 +229,13 @@ int asi_init(struct mm_struct *mm, int asi_index) asi->class =3D &asi_class[asi_index]; asi->mm =3D mm; =20 + if (asi->class->flags & ASI_MAP_STANDARD_NONSENSITIVE) { + uint i; + + for (i =3D KERNEL_PGD_BOUNDARY; i < PTRS_PER_PGD; i++) + set_pgd(asi->pgd + i, asi_global_nonsensitive_pgd[i]); + } + return 0; } EXPORT_SYMBOL_GPL(asi_init); diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c index 36098226a957..ebd512c64ed0 100644 --- a/arch/x86/mm/init_64.c +++ b/arch/x86/mm/init_64.c @@ -1277,18 +1277,15 @@ static void __init register_page_bootmem_info(void) #endif } =20 -/* - * Pre-allocates page-table pages for the vmalloc area in the kernel page-= table. - * Only the level which needs to be synchronized between all page-tables is - * allocated because the synchronization can be expensive. - */ -static void __init preallocate_vmalloc_pages(void) +void __init preallocate_toplevel_pgtbls(pgd_t *pgd_table, + ulong start, ulong end, + const char *name) { unsigned long addr; const char *lvl; =20 - for (addr =3D VMALLOC_START; addr <=3D VMALLOC_END; addr =3D ALIGN(addr += 1, PGDIR_SIZE)) { - pgd_t *pgd =3D pgd_offset_k(addr); + for (addr =3D start; addr <=3D end; addr =3D ALIGN(addr + 1, PGDIR_SIZE))= { + pgd_t *pgd =3D pgd_offset_pgd(pgd_table, addr); p4d_t *p4d; pud_t *pud; =20 @@ -1324,7 +1321,18 @@ static void __init preallocate_vmalloc_pages(void) * The pages have to be there now or they will be missing in * process page-tables later. */ - panic("Failed to pre-allocate %s pages for vmalloc area\n", lvl); + panic("Failed to pre-allocate %s pages for %s area\n", lvl, name); +} + +/* + * Pre-allocates page-table pages for the vmalloc area in the kernel page-= table. + * Only the level which needs to be synchronized between all page-tables is + * allocated because the synchronization can be expensive. + */ +static void __init preallocate_vmalloc_pages(void) +{ + preallocate_toplevel_pgtbls(init_mm.pgd, VMALLOC_START, VMALLOC_END, + "vmalloc"); } =20 void __init mem_init(void) diff --git a/arch/x86/mm/mm_internal.h b/arch/x86/mm/mm_internal.h index 3f37b5c80bb3..a1e8c523ab08 100644 --- a/arch/x86/mm/mm_internal.h +++ b/arch/x86/mm/mm_internal.h @@ -19,6 +19,9 @@ unsigned long kernel_physical_mapping_change(unsigned lon= g start, unsigned long page_size_mask); void zone_sizes_init(void); =20 +void preallocate_toplevel_pgtbls(pgd_t *pgd_table, ulong start, ulong end, + const char *name); + extern int after_bootmem; =20 void update_cache_mode_entry(unsigned entry, enum page_cache_mode cache); diff --git a/include/asm-generic/asi.h b/include/asm-generic/asi.h index 7da91cbe075d..012691e29895 100644 --- a/include/asm-generic/asi.h +++ b/include/asm-generic/asi.h @@ -12,6 +12,8 @@ #define ASI_MAX_NUM_ORDER 0 #define ASI_MAX_NUM 0 =20 +#define ASI_GLOBAL_NONSENSITIVE NULL + #ifndef _ASSEMBLY_ =20 struct asi_hooks {}; @@ -63,8 +65,11 @@ void asi_unmap(struct asi *asi, void *addr, size_t len, = bool flush_tlb) { } static inline void asi_flush_tlb_range(struct asi *asi, void *addr, size_t len) { } =20 +#define INIT_MM_ASI(init_mm) + #define static_asi_enabled() false =20 + #endif /* !_ASSEMBLY_ */ =20 #endif /* !CONFIG_ADDRESS_SPACE_ISOLATION */ diff --git a/mm/init-mm.c b/mm/init-mm.c index b4a6f38fb51d..47a6a66610fb 100644 --- a/mm/init-mm.c +++ b/mm/init-mm.c @@ -11,6 +11,7 @@ #include #include #include +#include =20 #ifndef INIT_MM_CONTEXT #define INIT_MM_CONTEXT(name) @@ -38,6 +39,7 @@ struct mm_struct init_mm =3D { .mmlist =3D LIST_HEAD_INIT(init_mm.mmlist), .user_ns =3D &init_user_ns, .cpu_bitmap =3D CPU_BITS_NONE, + INIT_MM_ASI(init_mm) INIT_MM_CONTEXT(init_mm) }; =20 --=20 2.35.1.473.g83b2b277ed-goog From nobody Fri Jun 26 13:44:01 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 41365C433EF for ; Wed, 23 Feb 2022 05:25:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238215AbiBWFZb (ORCPT ); Wed, 23 Feb 2022 00:25:31 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57628 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238181AbiBWFYy (ORCPT ); Wed, 23 Feb 2022 00:24:54 -0500 Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com [IPv6:2607:f8b0:4864:20::114a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 00B786AA63 for ; Tue, 22 Feb 2022 21:24:06 -0800 (PST) Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-2d07ae1145aso162060697b3.4 for ; Tue, 22 Feb 2022 21:24:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=A2kZT9A4P8bW0lI9YOtf1wQA9jJkQ1koWvhOh//ocUo=; b=fbSsueZGuBScgzqcY02VIl0ILFLXpPlKdSo5ctJYl8kSPI7w0kDZ1uGY8ujtC+fl+e PEzsS5j+6h2lyRr/UrTBatrXvU4f3NErvUAR7v4X1dzKu33iDdHE2QCAop5cuA3atf5U EJ4V1Fx706MmP14iSjNuJdP3U5u2B16C5Fj3e/UoRc8fZC4srXBJUArKZJ67Qxmu0hNA BsPOvLUKtyHj9nuWAemSw1G1+wDLXUYyYOFHHc5fb0JSBY7PBPNYf4tS05DMNxCxy6ed 1VKdWkC7HesYHDOsvNnF5BnhxEmTplkx92VqdXNvb7r03sGqA/SDTfz8vVfIQeADD5L1 ir/Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=A2kZT9A4P8bW0lI9YOtf1wQA9jJkQ1koWvhOh//ocUo=; b=rUshBPEAU9TvpiLZ7h8fDoROMy9Kmaxns506Fmm2/9httuso15ookwNlK9sghp0vMv Kp09GNvoi3uPJWuvX9lZQ7Xs2fCyGAtWr0oxCIKtKC4PVLrGF/PthNsYZChINEt/7E3S /SkflYp+oI6s4EM41xa0Ej5odZI0+glT3/E+Xw3SczfRaZKBhYRE09uaV46oSzF7vM++ +PgeBg71t1TQBBS+cXmk5UGpYKcQLKyD1T6G0ci/xzHh6ZfcHysWWmbBSRwIYzNo03xb z4+YvAlNWW5CSK80b08ELQ3KRjzLUnvDmWJKRoP/BCLIDwJ9YbWl1op/F8wfQOdBt/E2 swZA== X-Gm-Message-State: AOAM532M3Q8nOg/nO/ZzPR8/VbC9UrAq1RLt89EVEf4tAG4hNmn8uxZX YcYiHAY4NV/zx2wQdh+PdD/b2uIjNJwOv4PI+VELO8TEozfgFWzrbi2pfqO3FrPxrAD7fY8SjI4 h0xtHrWm+eE7sWbPqGRiEnjhf4gvl+kv164PrsEdcxfXjMBBWG5VeckwVlUfosdvDFFua2X6O X-Google-Smtp-Source: ABdhPJx1fLXJ2Nd+XyAyj3SwtD/6yx9YWclO/RpwDKJJBLCMeXbY/FNMqCpTASogEOd+LsJ96uLWXmi1HY+9 X-Received: from js-desktop.svl.corp.google.com ([2620:15c:2cd:202:ccbe:5d15:e2e6:322]) (user=junaids job=sendgmr) by 2002:a81:1d8c:0:b0:2cb:da76:5da8 with SMTP id d134-20020a811d8c000000b002cbda765da8mr27707177ywd.165.1645593845809; Tue, 22 Feb 2022 21:24:05 -0800 (PST) Date: Tue, 22 Feb 2022 21:21:45 -0800 In-Reply-To: <20220223052223.1202152-1-junaids@google.com> Message-Id: <20220223052223.1202152-10-junaids@google.com> Mime-Version: 1.0 References: <20220223052223.1202152-1-junaids@google.com> X-Mailer: git-send-email 2.35.1.473.g83b2b277ed-goog Subject: [RFC PATCH 09/47] mm: Add __PAGEFLAG_FALSE From: Junaid Shahid To: linux-kernel@vger.kernel.org Cc: kvm@vger.kernel.org, pbonzini@redhat.com, jmattson@google.com, pjt@google.com, oweisse@google.com, alexandre.chartre@oracle.com, rppt@linux.ibm.com, dave.hansen@linux.intel.com, peterz@infradead.org, tglx@linutronix.de, luto@kernel.org, linux-mm@kvack.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" __PAGEFLAG_FALSE is a non-atomic equivalent of PAGEFLAG_FALSE. Signed-off-by: Junaid Shahid --- include/linux/page-flags.h | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index b5f14d581113..b90a17e9796d 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -390,6 +390,10 @@ static inline int Page##uname(const struct page *page)= { return 0; } static inline void folio_set_##lname(struct folio *folio) { } \ static inline void SetPage##uname(struct page *page) { } =20 +#define __SETPAGEFLAG_NOOP(uname, lname) \ +static inline void __folio_set_##lname(struct folio *folio) { } \ +static inline void __SetPage##uname(struct page *page) { } + #define CLEARPAGEFLAG_NOOP(uname, lname) \ static inline void folio_clear_##lname(struct folio *folio) { } \ static inline void ClearPage##uname(struct page *page) { } @@ -411,6 +415,9 @@ static inline int TestClearPage##uname(struct page *pag= e) { return 0; } #define PAGEFLAG_FALSE(uname, lname) TESTPAGEFLAG_FALSE(uname, lname) \ SETPAGEFLAG_NOOP(uname, lname) CLEARPAGEFLAG_NOOP(uname, lname) =20 +#define __PAGEFLAG_FALSE(uname, lname) TESTPAGEFLAG_FALSE(uname, lname) \ + __SETPAGEFLAG_NOOP(uname, lname) __CLEARPAGEFLAG_NOOP(uname, lname) + #define TESTSCFLAG_FALSE(uname, lname) \ TESTSETFLAG_FALSE(uname, lname) TESTCLEARFLAG_FALSE(uname, lname) =20 --=20 2.35.1.473.g83b2b277ed-goog From nobody Fri Jun 26 13:44:01 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8A87EC433EF for ; Wed, 23 Feb 2022 05:24:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235836AbiBWFZM (ORCPT ); Wed, 23 Feb 2022 00:25:12 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57758 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236018AbiBWFYz (ORCPT ); Wed, 23 Feb 2022 00:24:55 -0500 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2E8A46C1F3 for ; Tue, 22 Feb 2022 21:24:08 -0800 (PST) Received: by mail-yb1-xb4a.google.com with SMTP id l3-20020a25ad43000000b0062462e2af34so11456599ybe.17 for ; Tue, 22 Feb 2022 21:24:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=gMgbv5PxdERvilfG13W1/59w3Y4MPufgzH9ipT5mZcE=; b=b6urhxkWGfnvarQIxVi9atsKIrKLwBXqKa9HpQGtESTNRqGIm9vkGf3iqdrTrF3WZR 5m1o2lHCH4V6NfBZsxlk8GhGN3fJPzlR3rYLD19suBuPDuixp1NPgPg6U6hpUzZvTY5L IgBE58eLo2ZZzbqp5xEwBHiVFVtDJ87nrKLza6quhghsvVIu54Q0IHEcJ3cNhC/sxTwh jZftmgJSZ3zhdFt219bc3VfY34rKbPUl3RGNBdUPaeK8YOXIcRYVBQ85kA7yXhGQqR5+ DMF/TwW/QA+HqXhMMcxPWsADaTGbuLYcULrIcRh5s/jS84XHXJ0eFx/xBfm7XL6hFrwa +MKQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=gMgbv5PxdERvilfG13W1/59w3Y4MPufgzH9ipT5mZcE=; b=q17eb0ZHA3+ghCF1kyzSt2krPVdFPyEijngIgeUGezGR33OPCU1lGM8xMjSiSGGQrL 6h3vtdFx1IGa5oguZnzp7LubH9zTsNdM4p/ottPetnM8EpHsDeTXvpJaA7WjWCuWP7FP J62mqiG3E5yCO4/1f0+dF+BHHFbtJjJtOcKAKZegqTEN6fL062ruCu4Vbw7lP6SDfN8Y 5iMHxmQqttx1WqFjvNl46yV8b4KV44eme1OvLdqNGkR49GUuNHZzKTgJCECk+DI7ppsI Vh1Gyd1lNn62dkGLmhrs0FOCSlPTTODafZMQeDawtddi1Fm2IB50+xscelH1pJWWGpDw kaRg== X-Gm-Message-State: AOAM5323M4X00rMS81u1oqUeoYPhJv3NqBkb9hGZKQVtaM33Br0ICOuP fXZzerD9qnZkUzwl7qbpvoPjfpaf9OlR+zZS47E0f4Ad9kkkkOKskzX3FbMc6lXuRSKOBWnD0t6 lMtEdsaIsz0V31KO3B5+P/JlO/HMZCFo85CKqI9GVEZ0ijCxYj4lk/NXnQZA0GxZHsWtJOlvq X-Google-Smtp-Source: ABdhPJxT/vWX0OjiErFxevki8JUky5j473dKCodPTix92VCbsXfh3r9zdc2C7QnCBcd9ntaYhKL50joC8J3g X-Received: from js-desktop.svl.corp.google.com ([2620:15c:2cd:202:ccbe:5d15:e2e6:322]) (user=junaids job=sendgmr) by 2002:a81:5d0:0:b0:2d0:d056:c703 with SMTP id 199-20020a8105d0000000b002d0d056c703mr28080196ywf.288.1645593847896; Tue, 22 Feb 2022 21:24:07 -0800 (PST) Date: Tue, 22 Feb 2022 21:21:46 -0800 In-Reply-To: <20220223052223.1202152-1-junaids@google.com> Message-Id: <20220223052223.1202152-11-junaids@google.com> Mime-Version: 1.0 References: <20220223052223.1202152-1-junaids@google.com> X-Mailer: git-send-email 2.35.1.473.g83b2b277ed-goog Subject: [RFC PATCH 10/47] mm: asi: Support for global non-sensitive direct map allocations From: Junaid Shahid To: linux-kernel@vger.kernel.org Cc: kvm@vger.kernel.org, pbonzini@redhat.com, jmattson@google.com, pjt@google.com, oweisse@google.com, alexandre.chartre@oracle.com, rppt@linux.ibm.com, dave.hansen@linux.intel.com, peterz@infradead.org, tglx@linutronix.de, luto@kernel.org, linux-mm@kvack.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" A new GFP flag is added to specify that an allocation should be considered globally non-sensitive. The pages will be mapped into the ASI global non-sensitive pseudo-PGD, which is shared between all standard ASI instances. A new page flag is also added so that when these pages are freed, they can also be unmapped from the ASI page tables. Signed-off-by: Junaid Shahid --- include/linux/gfp.h | 10 ++- include/linux/mm_types.h | 5 ++ include/linux/page-flags.h | 9 ++ include/trace/events/mmflags.h | 12 ++- mm/page_alloc.c | 145 ++++++++++++++++++++++++++++++++- tools/perf/builtin-kmem.c | 1 + 6 files changed, 178 insertions(+), 4 deletions(-) diff --git a/include/linux/gfp.h b/include/linux/gfp.h index 8fcc38467af6..07a99a463a34 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -60,6 +60,11 @@ struct vm_area_struct; #else #define ___GFP_NOLOCKDEP 0 #endif +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION +#define ___GFP_GLOBAL_NONSENSITIVE 0x4000000u +#else +#define ___GFP_GLOBAL_NONSENSITIVE 0 +#endif /* If the above are modified, __GFP_BITS_SHIFT may need updating */ =20 /* @@ -248,8 +253,11 @@ struct vm_area_struct; /* Disable lockdep for GFP context tracking */ #define __GFP_NOLOCKDEP ((__force gfp_t)___GFP_NOLOCKDEP) =20 +/* Allocate non-sensitive memory */ +#define __GFP_GLOBAL_NONSENSITIVE ((__force gfp_t)___GFP_GLOBAL_NONSENSITI= VE) + /* Room for N __GFP_FOO bits */ -#define __GFP_BITS_SHIFT (25 + IS_ENABLED(CONFIG_LOCKDEP)) +#define __GFP_BITS_SHIFT 27 #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1)) =20 /** diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 3de1afa57289..5b8028fcfe67 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -191,6 +191,11 @@ struct page { =20 /** @rcu_head: You can use this to free a page by RCU. */ struct rcu_head rcu_head; + +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + /* Links the pages_to_free_async list */ + struct llist_node async_free_node; +#endif }; =20 union { /* This union is 4 bytes in size. */ diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index b90a17e9796d..a07434cc679c 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -140,6 +140,9 @@ enum pageflags { #endif #ifdef CONFIG_KASAN_HW_TAGS PG_skip_kasan_poison, +#endif +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + PG_global_nonsensitive, #endif __NR_PAGEFLAGS, =20 @@ -542,6 +545,12 @@ TESTCLEARFLAG(Young, young, PF_ANY) PAGEFLAG(Idle, idle, PF_ANY) #endif =20 +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION +__PAGEFLAG(GlobalNonSensitive, global_nonsensitive, PF_ANY); +#else +__PAGEFLAG_FALSE(GlobalNonSensitive, global_nonsensitive); +#endif + #ifdef CONFIG_KASAN_HW_TAGS PAGEFLAG(SkipKASanPoison, skip_kasan_poison, PF_HEAD) #else diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h index 116ed4d5d0f8..73a49197ef54 100644 --- a/include/trace/events/mmflags.h +++ b/include/trace/events/mmflags.h @@ -50,7 +50,8 @@ {(unsigned long)__GFP_DIRECT_RECLAIM, "__GFP_DIRECT_RECLAIM"},\ {(unsigned long)__GFP_KSWAPD_RECLAIM, "__GFP_KSWAPD_RECLAIM"},\ {(unsigned long)__GFP_ZEROTAGS, "__GFP_ZEROTAGS"}, \ - {(unsigned long)__GFP_SKIP_KASAN_POISON,"__GFP_SKIP_KASAN_POISON"}\ + {(unsigned long)__GFP_SKIP_KASAN_POISON,"__GFP_SKIP_KASAN_POISON"},\ + {(unsigned long)__GFP_GLOBAL_NONSENSITIVE, "__GFP_GLOBAL_NONSENSITIVE"}\ =20 #define show_gfp_flags(flags) \ (flags) ? __print_flags(flags, "|", \ @@ -93,6 +94,12 @@ #define IF_HAVE_PG_SKIP_KASAN_POISON(flag,string) #endif =20 +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION +#define IF_HAVE_ASI(flag, string) ,{1UL << flag, string} +#else +#define IF_HAVE_ASI(flag, string) +#endif + #define __def_pageflag_names \ {1UL << PG_locked, "locked" }, \ {1UL << PG_waiters, "waiters" }, \ @@ -121,7 +128,8 @@ IF_HAVE_PG_HWPOISON(PG_hwpoison, "hwpoison" ) \ IF_HAVE_PG_IDLE(PG_young, "young" ) \ IF_HAVE_PG_IDLE(PG_idle, "idle" ) \ IF_HAVE_PG_ARCH_2(PG_arch_2, "arch_2" ) \ -IF_HAVE_PG_SKIP_KASAN_POISON(PG_skip_kasan_poison, "skip_kasan_poison") +IF_HAVE_PG_SKIP_KASAN_POISON(PG_skip_kasan_poison, "skip_kasan_poison") \ +IF_HAVE_ASI(PG_global_nonsensitive, "global_nonsensitive") =20 #define show_page_flags(flags) \ (flags) ? __print_flags(flags, "|", \ diff --git a/mm/page_alloc.c b/mm/page_alloc.c index c5952749ad40..a4048fa1868a 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -697,7 +697,7 @@ static inline bool pcp_allowed_order(unsigned int order) return false; } =20 -static inline void free_the_page(struct page *page, unsigned int order) +static inline void __free_the_page(struct page *page, unsigned int order) { if (pcp_allowed_order(order)) /* Via pcp? */ free_unref_page(page, order); @@ -705,6 +705,14 @@ static inline void free_the_page(struct page *page, un= signed int order) __free_pages_ok(page, order, FPI_NONE); } =20 +static bool asi_unmap_freed_pages(struct page *page, unsigned int order); + +static inline void free_the_page(struct page *page, unsigned int order) +{ + if (asi_unmap_freed_pages(page, order)) + __free_the_page(page, order); +} + /* * Higher-order pages are called "compound pages". They are structured th= usly: * @@ -5162,6 +5170,129 @@ static inline bool prepare_alloc_pages(gfp_t gfp_ma= sk, unsigned int order, return true; } =20 +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + +static DEFINE_PER_CPU(struct work_struct, async_free_work); +static DEFINE_PER_CPU(struct llist_head, pages_to_free_async); +static bool async_free_work_initialized; + +static void __free_the_page(struct page *page, unsigned int order); + +static void async_free_work_fn(struct work_struct *work) +{ + struct page *page, *tmp; + struct llist_node *pages_to_free; + void *va; + size_t len; + uint order; + + pages_to_free =3D llist_del_all(this_cpu_ptr(&pages_to_free_async)); + + /* A later patch will do a more optimized TLB flush. */ + + llist_for_each_entry_safe(page, tmp, pages_to_free, async_free_node) { + va =3D page_to_virt(page); + order =3D page->private; + len =3D PAGE_SIZE * (1 << order); + + asi_flush_tlb_range(ASI_GLOBAL_NONSENSITIVE, va, len); + __free_the_page(page, order); + } +} + +static int __init asi_page_alloc_init(void) +{ + int cpu; + + if (!static_asi_enabled()) + return 0; + + for_each_possible_cpu(cpu) + INIT_WORK(per_cpu_ptr(&async_free_work, cpu), + async_free_work_fn); + + /* + * This function is called before SMP is initialized, so we can assume + * that this is the only running CPU at this point. + */ + + barrier(); + async_free_work_initialized =3D true; + barrier(); + + if (!llist_empty(this_cpu_ptr(&pages_to_free_async))) + queue_work_on(smp_processor_id(), mm_percpu_wq, + this_cpu_ptr(&async_free_work)); + + return 0; +} +early_initcall(asi_page_alloc_init); + +static int asi_map_alloced_pages(struct page *page, uint order, gfp_t gfp_= mask) +{ + uint i; + + if (!static_asi_enabled()) + return 0; + + if (gfp_mask & __GFP_GLOBAL_NONSENSITIVE) { + for (i =3D 0; i < (1 << order); i++) + __SetPageGlobalNonSensitive(page + i); + + return asi_map_gfp(ASI_GLOBAL_NONSENSITIVE, page_to_virt(page), + PAGE_SIZE * (1 << order), gfp_mask); + } + + return 0; +} + +static bool asi_unmap_freed_pages(struct page *page, unsigned int order) +{ + void *va; + size_t len; + bool async_flush_needed; + + if (!static_asi_enabled()) + return true; + + if (!PageGlobalNonSensitive(page)) + return true; + + va =3D page_to_virt(page); + len =3D PAGE_SIZE * (1 << order); + async_flush_needed =3D irqs_disabled() || in_interrupt(); + + asi_unmap(ASI_GLOBAL_NONSENSITIVE, va, len, !async_flush_needed); + + if (!async_flush_needed) + return true; + + page->private =3D order; + llist_add(&page->async_free_node, this_cpu_ptr(&pages_to_free_async)); + + if (async_free_work_initialized) + queue_work_on(smp_processor_id(), mm_percpu_wq, + this_cpu_ptr(&async_free_work)); + + return false; +} + +#else /* CONFIG_ADDRESS_SPACE_ISOLATION */ + +static inline +int asi_map_alloced_pages(struct page *pages, uint order, gfp_t gfp_mask) +{ + return 0; +} + +static inline +bool asi_unmap_freed_pages(struct page *page, unsigned int order) +{ + return true; +} + +#endif + /* * __alloc_pages_bulk - Allocate a number of order-0 pages to a list or ar= ray * @gfp: GFP flags for the allocation @@ -5345,6 +5476,9 @@ struct page *__alloc_pages(gfp_t gfp, unsigned int or= der, int preferred_nid, return NULL; } =20 + if (static_asi_enabled() && (gfp & __GFP_GLOBAL_NONSENSITIVE)) + gfp |=3D __GFP_ZERO; + gfp &=3D gfp_allowed_mask; /* * Apply scoped allocation constraints. This is mainly about GFP_NOFS @@ -5388,6 +5522,15 @@ struct page *__alloc_pages(gfp_t gfp, unsigned int o= rder, int preferred_nid, page =3D NULL; } =20 + if (page) { + int err =3D asi_map_alloced_pages(page, order, gfp); + + if (unlikely(err)) { + __free_pages(page, order); + page =3D NULL; + } + } + trace_mm_page_alloc(page, order, alloc_gfp, ac.migratetype); =20 return page; diff --git a/tools/perf/builtin-kmem.c b/tools/perf/builtin-kmem.c index da03a341c63c..5857953cd5c1 100644 --- a/tools/perf/builtin-kmem.c +++ b/tools/perf/builtin-kmem.c @@ -660,6 +660,7 @@ static const struct { { "__GFP_RECLAIM", "R" }, { "__GFP_DIRECT_RECLAIM", "DR" }, { "__GFP_KSWAPD_RECLAIM", "KR" }, + { "__GFP_GLOBAL_NONSENSITIVE", "GNS" }, }; =20 static size_t max_gfp_len; --=20 2.35.1.473.g83b2b277ed-goog From nobody Fri Jun 26 13:44:01 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 137D0C433FE for ; Wed, 23 Feb 2022 05:25:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236536AbiBWFZ0 (ORCPT ); Wed, 23 Feb 2022 00:25:26 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57712 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238226AbiBWFY4 (ORCPT ); Wed, 23 Feb 2022 00:24:56 -0500 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 293F46C938 for ; Tue, 22 Feb 2022 21:24:11 -0800 (PST) Received: by mail-yb1-xb49.google.com with SMTP id x1-20020a25a001000000b0061c64ee0196so26707336ybh.9 for ; Tue, 22 Feb 2022 21:24:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=ZeYN5P9RWE85rWG989hD+TytaZQUQYUlju5cKer3rg4=; b=ojUJsUEL9fv13WjcZCmN21LxwR3Z6zqiLpmAQSzxCgc5C9RlKzd4dJEPqIdea+S3R3 SjjKai+zXn4VwTi+vC8vw+NDJWHGEkF4oYybv8WUgGBs6X4Bacon5BeJ8KDM6+Kjcqg7 fYXSLYWwuUfF3ZzaHWiW5r5fUjU5hUbi5i8udL9OUGPQxFLT6BZJN8cKiUG2KSidmSdI 7UxdullkH2J71pHhwQMQjuVH7Fyee/6awuNUgdr31148ohQL1xZz1MmOnDGr6y/5AC05 atbUvniVVa19eg7E36/l3gFWXXYduMYcF6MD2CAuUTHeqJzSqvtlhwblL5Fzis+kQ6SQ claQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=ZeYN5P9RWE85rWG989hD+TytaZQUQYUlju5cKer3rg4=; b=o9GmaIOCn6pQbQKn5THjsSAlIleirmL3MOiK+T+4DKwrqEqb+Aza/sSkYjeRpmkTxh Antvovjn1M8rlNArB6AfTmixOCKBISntYngT2A1z8QTBJV4uGXM2egnfeOjiHjB0COj8 7OqvCeqhcvQZ0+ozkDFx1Jr0kqT0I1ye8jLvbxJhmHZhbjRM73E1ShwHol9k1VIpe1xq z92OPnUmFqgSyvsCOFNRWZaf3NW125K686cOuAaCWaN4MEU6Fhl6iJ1ROcp//hdO8Wbe Tugal4yoBoAnsN3fAQxU4zNeKjYuNI0f8KP9N/Yd119oG5Dn6Ti7NoN2xAz1TtdntJ84 ByHg== X-Gm-Message-State: AOAM531uBFCGA2PxTd8GZBvuYvgL0egZY1sHCqmvEOfGVl1yaT9vRZiS Dg2EUfNmdNfd725jAk3XRTexR6hXHNNznnJ1cUjF/SxKKC4RSK28fQcuafKovIc1AOXk7yXZI18 x7sS4vwOG0v9cZvmHFQCAOm66aqnPLg80lisCJ1fCUZ9ACfmILrwtd5omfI8yLE1IifD1vaTo X-Google-Smtp-Source: ABdhPJx7Y4lP1+DVrEJyeaCPqdjSHoYWrsMiEOQNVQ/p8prXsXPbox8dmisMikykjaQnA+B25uCbwTFY5Vh8 X-Received: from js-desktop.svl.corp.google.com ([2620:15c:2cd:202:ccbe:5d15:e2e6:322]) (user=junaids job=sendgmr) by 2002:a5b:589:0:b0:61d:de51:9720 with SMTP id l9-20020a5b0589000000b0061dde519720mr26317731ybp.167.1645593850281; Tue, 22 Feb 2022 21:24:10 -0800 (PST) Date: Tue, 22 Feb 2022 21:21:47 -0800 In-Reply-To: <20220223052223.1202152-1-junaids@google.com> Message-Id: <20220223052223.1202152-12-junaids@google.com> Mime-Version: 1.0 References: <20220223052223.1202152-1-junaids@google.com> X-Mailer: git-send-email 2.35.1.473.g83b2b277ed-goog Subject: [RFC PATCH 11/47] mm: asi: Global non-sensitive vmalloc/vmap support From: Junaid Shahid To: linux-kernel@vger.kernel.org Cc: kvm@vger.kernel.org, pbonzini@redhat.com, jmattson@google.com, pjt@google.com, oweisse@google.com, alexandre.chartre@oracle.com, rppt@linux.ibm.com, dave.hansen@linux.intel.com, peterz@infradead.org, tglx@linutronix.de, luto@kernel.org, linux-mm@kvack.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" A new flag, VM_GLOBAL_NONSENSITIVE is added to designate globally non-sensitive vmalloc/vmap areas. When using the __vmalloc / __vmalloc_node APIs, if the corresponding GFP flag is specified, the VM flag is automatically added. When using the __vmalloc_node_range API, either flag can be specified independently. The VM flag will only map the vmalloc area as non-sensitive, while the GFP flag will only map the underlying direct map area as non-sensitive. When using the __vmalloc_node_range API, instead of VMALLOC_START/END, VMALLOC_GLOBAL_NONSENSITIVE_START/END should be used. This is to keep these mappings separate from locally non-sensitive vmalloc areas, which will be added later. Areas outside of the standard vmalloc range can specify the range as before. Signed-off-by: Junaid Shahid --- arch/x86/include/asm/pgtable_64_types.h | 5 +++ arch/x86/mm/asi.c | 3 +- include/asm-generic/asi.h | 3 ++ include/linux/vmalloc.h | 6 +++ mm/vmalloc.c | 53 ++++++++++++++++++++++--- 5 files changed, 64 insertions(+), 6 deletions(-) diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm= /pgtable_64_types.h index 91ac10654570..0fc380ba25b8 100644 --- a/arch/x86/include/asm/pgtable_64_types.h +++ b/arch/x86/include/asm/pgtable_64_types.h @@ -141,6 +141,11 @@ extern unsigned int ptrs_per_p4d; =20 #define VMALLOC_END (VMALLOC_START + (VMALLOC_SIZE_TB << 40) - 1) =20 +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION +#define VMALLOC_GLOBAL_NONSENSITIVE_START VMALLOC_START +#define VMALLOC_GLOBAL_NONSENSITIVE_END VMALLOC_END +#endif + #define MODULES_VADDR (__START_KERNEL_map + KERNEL_IMAGE_SIZE) /* The module sections ends with the start of the fixmap */ #ifndef CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP diff --git a/arch/x86/mm/asi.c b/arch/x86/mm/asi.c index d381ae573af9..71348399baf1 100644 --- a/arch/x86/mm/asi.c +++ b/arch/x86/mm/asi.c @@ -198,7 +198,8 @@ static int __init asi_global_init(void) "ASI Global Non-sensitive direct map"); =20 preallocate_toplevel_pgtbls(asi_global_nonsensitive_pgd, - VMALLOC_START, VMALLOC_END, + VMALLOC_GLOBAL_NONSENSITIVE_START, + VMALLOC_GLOBAL_NONSENSITIVE_END, "ASI Global Non-sensitive vmalloc"); =20 return 0; diff --git a/include/asm-generic/asi.h b/include/asm-generic/asi.h index 012691e29895..f918cd052722 100644 --- a/include/asm-generic/asi.h +++ b/include/asm-generic/asi.h @@ -14,6 +14,9 @@ =20 #define ASI_GLOBAL_NONSENSITIVE NULL =20 +#define VMALLOC_GLOBAL_NONSENSITIVE_START VMALLOC_START +#define VMALLOC_GLOBAL_NONSENSITIVE_END VMALLOC_END + #ifndef _ASSEMBLY_ =20 struct asi_hooks {}; diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h index 6e022cc712e6..c7c66decda3e 100644 --- a/include/linux/vmalloc.h +++ b/include/linux/vmalloc.h @@ -39,6 +39,12 @@ struct notifier_block; /* in notifier.h */ * determine which allocations need the module shadow freed. */ =20 +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION +#define VM_GLOBAL_NONSENSITIVE 0x00000800 /* Similar to __GFP_GLOBAL_NONSE= NSITIVE */ +#else +#define VM_GLOBAL_NONSENSITIVE 0 +#endif + /* bits [20..32] reserved for arch specific ioremap internals */ =20 /* diff --git a/mm/vmalloc.c b/mm/vmalloc.c index f2ef719f1cba..ba588a37ee75 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -2393,6 +2393,33 @@ void __init vmalloc_init(void) vmap_initialized =3D true; } =20 +static int asi_map_vm_area(struct vm_struct *area) +{ + if (!static_asi_enabled()) + return 0; + + if (area->flags & VM_GLOBAL_NONSENSITIVE) + return asi_map(ASI_GLOBAL_NONSENSITIVE, area->addr, + get_vm_area_size(area)); + + return 0; +} + +static void asi_unmap_vm_area(struct vm_struct *area) +{ + if (!static_asi_enabled()) + return; + + /* + * TODO: The TLB flush here could potentially be avoided in + * the case when the existing flush from try_purge_vmap_area_lazy() + * and/or vm_unmap_aliases() happens non-lazily. + */ + if (area->flags & VM_GLOBAL_NONSENSITIVE) + asi_unmap(ASI_GLOBAL_NONSENSITIVE, area->addr, + get_vm_area_size(area), true); +} + static inline void setup_vmalloc_vm_locked(struct vm_struct *vm, struct vmap_area *va, unsigned long flags, const void *caller) { @@ -2570,6 +2597,7 @@ static void vm_remove_mappings(struct vm_struct *area= , int deallocate_pages) int flush_dmap =3D 0; int i; =20 + asi_unmap_vm_area(area); remove_vm_area(area->addr); =20 /* If this is not VM_FLUSH_RESET_PERMS memory, no need for the below. */ @@ -2787,16 +2815,20 @@ void *vmap(struct page **pages, unsigned int count, =20 addr =3D (unsigned long)area->addr; if (vmap_pages_range(addr, addr + size, pgprot_nx(prot), - pages, PAGE_SHIFT) < 0) { - vunmap(area->addr); - return NULL; - } + pages, PAGE_SHIFT) < 0) + goto err; + + if (asi_map_vm_area(area)) + goto err; =20 if (flags & VM_MAP_PUT_PAGES) { area->pages =3D pages; area->nr_pages =3D count; } return area->addr; +err: + vunmap(area->addr); + return NULL; } EXPORT_SYMBOL(vmap); =20 @@ -2991,6 +3023,9 @@ static void *__vmalloc_area_node(struct vm_struct *ar= ea, gfp_t gfp_mask, goto fail; } =20 + if (asi_map_vm_area(area)) + goto fail; + return area->addr; =20 fail: @@ -3038,6 +3073,9 @@ void *__vmalloc_node_range(unsigned long size, unsign= ed long align, if (WARN_ON_ONCE(!size)) return NULL; =20 + if (static_asi_enabled() && (vm_flags & VM_GLOBAL_NONSENSITIVE)) + gfp_mask |=3D __GFP_ZERO; + if ((size >> PAGE_SHIFT) > totalram_pages()) { warn_alloc(gfp_mask, NULL, "vmalloc error: size %lu, exceeds total pages", @@ -3127,8 +3165,13 @@ void *__vmalloc_node_range(unsigned long size, unsig= ned long align, void *__vmalloc_node(unsigned long size, unsigned long align, gfp_t gfp_mask, int node, const void *caller) { + ulong vm_flags =3D 0; + + if (static_asi_enabled() && (gfp_mask & __GFP_GLOBAL_NONSENSITIVE)) + vm_flags |=3D VM_GLOBAL_NONSENSITIVE; + return __vmalloc_node_range(size, align, VMALLOC_START, VMALLOC_END, - gfp_mask, PAGE_KERNEL, 0, node, caller); + gfp_mask, PAGE_KERNEL, vm_flags, node, caller); } /* * This is only for performance analysis of vmalloc and stress purpose. --=20 2.35.1.473.g83b2b277ed-goog From nobody Fri Jun 26 13:44:01 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2138FC433FE for ; Wed, 23 Feb 2022 05:24:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233068AbiBWFZT (ORCPT ); Wed, 23 Feb 2022 00:25:19 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57896 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238247AbiBWFY6 (ORCPT ); Wed, 23 Feb 2022 00:24:58 -0500 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id ABEA86CA4B for ; Tue, 22 Feb 2022 21:24:13 -0800 (PST) Received: by mail-yb1-xb49.google.com with SMTP id z15-20020a25bb0f000000b00613388c7d99so26692775ybg.8 for ; Tue, 22 Feb 2022 21:24:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=h48HgrV9Ms87ueeFjUPVjN/Yw+KwlPL1CcnJb5APGwE=; b=Hmt0w/kiWsp/YP+equQ01AdLjZMh/o/2Z8oEGQ6Lq4Nt157PlHz1XpRl+A7GPQXL2i FRSX/mXn548mQM/TC1p/MGu32EFPYWhSLwU4OefqIbwBe+iA80Pue4vU6e/ZB7nsR4Y4 523kigSioD12IuJ5T0licYK9j98yC1QXVRYs3anUzXJCRNXAH9qqii0hyNma152NX9Ps ggEjc8Pz3upnBint0xIR6P2QSeIhMzI8K84X5fX/AWfAvKn+5L2W9liNKxXaSWKaCFma +45lbJ2fC4qur3JJ5wHMSfnoIjceTnzqkA7atH2Cu3t0CWoRI/zCfcvLHfSjM3PIpuRT HILQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=h48HgrV9Ms87ueeFjUPVjN/Yw+KwlPL1CcnJb5APGwE=; b=2z++9053YQzYjmZH3K1S4qvNrxE20Pyw/ZbiGfiLMrT6CwOct71p8uVlLxbAOqdyqY jxLPIyj7tJeBureFE9OQb+uZnZSN32AG7WAld5bRgGWDmx9++RMks1teOcoTvuygGpzX hUjDz1taF4/FQWl83i6fHJCct7mlgtZ0sj+k0Zv66uaNs31C6IvKh0QperBAk1JVPeXF BToI5wIsjyoOyYVRFKVhtMAVRDQkhZ5Dtqlg2PJ8wRm8VDT7GgGiX8zGdi+i7LPk//hU Yh1qOIq5UEfbjunXJLKDqJoqYawKk8FmnzO9R2uAnxyjg2Esytyt9rPl4CRNS3CQgnvw Sy/g== X-Gm-Message-State: AOAM530aWTU8RVwYFOJenqeS7yGsvO31tOqQSfJaS9piAbb/aogdy7sz V2o9MAKzHA+dyqEZxJKORAUAoVIS9LjbfRHTTzu1RpNUlG2eS5H745T3zcLt2bxZy9MKNTs4umX xSqET2LuNkoccXPPifZe4sHibSYnOSznPXxS/B0ldIHryR9SPYalw1ivkcdBocaIo4W8NYPCr X-Google-Smtp-Source: ABdhPJyfC5FSpMaAtl/McL2SxzvvGsqtTFjrUAQSrC01vyuUjm1vV+WNWVvhncOg+7K7oU2DAPMbItihfugH X-Received: from js-desktop.svl.corp.google.com ([2620:15c:2cd:202:ccbe:5d15:e2e6:322]) (user=junaids job=sendgmr) by 2002:a81:1c47:0:b0:2d7:5822:1739 with SMTP id c68-20020a811c47000000b002d758221739mr11411035ywc.502.1645593852744; Tue, 22 Feb 2022 21:24:12 -0800 (PST) Date: Tue, 22 Feb 2022 21:21:48 -0800 In-Reply-To: <20220223052223.1202152-1-junaids@google.com> Message-Id: <20220223052223.1202152-13-junaids@google.com> Mime-Version: 1.0 References: <20220223052223.1202152-1-junaids@google.com> X-Mailer: git-send-email 2.35.1.473.g83b2b277ed-goog Subject: [RFC PATCH 12/47] mm: asi: Support for global non-sensitive slab caches From: Junaid Shahid To: linux-kernel@vger.kernel.org Cc: kvm@vger.kernel.org, pbonzini@redhat.com, jmattson@google.com, pjt@google.com, oweisse@google.com, alexandre.chartre@oracle.com, rppt@linux.ibm.com, dave.hansen@linux.intel.com, peterz@infradead.org, tglx@linutronix.de, luto@kernel.org, linux-mm@kvack.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" A new flag SLAB_GLOBAL_NONSENSITIVE is added, which would designate all objects within that slab cache to be globally non-sensitive. Another flag SLAB_NONSENSITIVE is also added, which is currently just an alias for SLAB_GLOBAL_NONSENSITIVE, but will eventually be used to designate slab caches which can allocate either global or local non-sensitive objects. In addition, new kmalloc caches have been added that can be used to allocate non-sensitive objects. Signed-off-by: Junaid Shahid --- include/linux/slab.h | 32 +++++++++++++++---- mm/slab.c | 5 +++ mm/slab.h | 14 ++++++++- mm/slab_common.c | 73 +++++++++++++++++++++++++++++++++----------- security/Kconfig | 2 +- 5 files changed, 101 insertions(+), 25 deletions(-) diff --git a/include/linux/slab.h b/include/linux/slab.h index 181045148b06..7b8a3853d827 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -120,6 +120,12 @@ /* Slab deactivation flag */ #define SLAB_DEACTIVATED ((slab_flags_t __force)0x10000000U) =20 +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION +#define SLAB_GLOBAL_NONSENSITIVE ((slab_flags_t __force)0x20000000U) +#else +#define SLAB_GLOBAL_NONSENSITIVE 0 +#endif + /* * ZERO_SIZE_PTR will be returned for zero sized kmalloc requests. * @@ -329,6 +335,11 @@ enum kmalloc_cache_type { extern struct kmem_cache * kmalloc_caches[NR_KMALLOC_TYPES][KMALLOC_SHIFT_HIGH + 1]; =20 +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION +extern struct kmem_cache * +nonsensitive_kmalloc_caches[NR_KMALLOC_TYPES][KMALLOC_SHIFT_HIGH + 1]; +#endif + /* * Define gfp bits that should not be set for KMALLOC_NORMAL. */ @@ -361,6 +372,17 @@ static __always_inline enum kmalloc_cache_type kmalloc= _type(gfp_t flags) return KMALLOC_CGROUP; } =20 +static __always_inline struct kmem_cache *get_kmalloc_cache(gfp_t flags, + uint index) +{ +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + + if (static_asi_enabled() && (flags & __GFP_GLOBAL_NONSENSITIVE)) + return nonsensitive_kmalloc_caches[kmalloc_type(flags)][index]; +#endif + return kmalloc_caches[kmalloc_type(flags)][index]; +} + /* * Figure out which kmalloc slab an allocation of a certain size * belongs to. @@ -587,9 +609,8 @@ static __always_inline __alloc_size(1) void *kmalloc(si= ze_t size, gfp_t flags) if (!index) return ZERO_SIZE_PTR; =20 - return kmem_cache_alloc_trace( - kmalloc_caches[kmalloc_type(flags)][index], - flags, size); + return kmem_cache_alloc_trace(get_kmalloc_cache(flags, index), + flags, size); #endif } return __kmalloc(size, flags); @@ -605,9 +626,8 @@ static __always_inline __alloc_size(1) void *kmalloc_no= de(size_t size, gfp_t fla if (!i) return ZERO_SIZE_PTR; =20 - return kmem_cache_alloc_node_trace( - kmalloc_caches[kmalloc_type(flags)][i], - flags, node, size); + return kmem_cache_alloc_node_trace(get_kmalloc_cache(flags, i), + flags, node, size); } #endif return __kmalloc_node(size, flags, node); diff --git a/mm/slab.c b/mm/slab.c index ca4822f6b2b6..5a928d95d67b 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -1956,6 +1956,9 @@ int __kmem_cache_create(struct kmem_cache *cachep, sl= ab_flags_t flags) size =3D ALIGN(size, REDZONE_ALIGN); } =20 + if (!static_asi_enabled()) + flags &=3D ~SLAB_NONSENSITIVE; + /* 3) caller mandated alignment */ if (ralign < cachep->align) { ralign =3D cachep->align; @@ -2058,6 +2061,8 @@ int __kmem_cache_create(struct kmem_cache *cachep, sl= ab_flags_t flags) cachep->allocflags |=3D GFP_DMA32; if (flags & SLAB_RECLAIM_ACCOUNT) cachep->allocflags |=3D __GFP_RECLAIMABLE; + if (flags & SLAB_GLOBAL_NONSENSITIVE) + cachep->allocflags |=3D __GFP_GLOBAL_NONSENSITIVE; cachep->size =3D size; cachep->reciprocal_buffer_size =3D reciprocal_value(size); =20 diff --git a/mm/slab.h b/mm/slab.h index 56ad7eea3ddf..f190f4fc0286 100644 --- a/mm/slab.h +++ b/mm/slab.h @@ -77,6 +77,10 @@ extern struct kmem_cache *kmem_cache; /* A table of kmalloc cache names and sizes */ extern const struct kmalloc_info_struct { const char *name[NR_KMALLOC_TYPES]; +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + const char *nonsensitive_name[NR_KMALLOC_TYPES]; +#endif + slab_flags_t flags[NR_KMALLOC_TYPES]; unsigned int size; } kmalloc_info[]; =20 @@ -124,11 +128,14 @@ static inline slab_flags_t kmem_cache_flags(unsigned = int object_size, } #endif =20 +/* This will also include SLAB_LOCAL_NONSENSITIVE in a later patch. */ +#define SLAB_NONSENSITIVE SLAB_GLOBAL_NONSENSITIVE =20 /* Legal flag mask for kmem_cache_create(), for various configurations */ #define SLAB_CORE_FLAGS (SLAB_HWCACHE_ALIGN | SLAB_CACHE_DMA | \ SLAB_CACHE_DMA32 | SLAB_PANIC | \ - SLAB_TYPESAFE_BY_RCU | SLAB_DEBUG_OBJECTS ) + SLAB_TYPESAFE_BY_RCU | SLAB_DEBUG_OBJECTS | \ + SLAB_NONSENSITIVE) =20 #if defined(CONFIG_DEBUG_SLAB) #define SLAB_DEBUG_FLAGS (SLAB_RED_ZONE | SLAB_POISON | SLAB_STORE_USER) @@ -491,6 +498,11 @@ static inline struct kmem_cache *slab_pre_alloc_hook(s= truct kmem_cache *s, =20 might_alloc(flags); =20 + if (static_asi_enabled()) { + VM_BUG_ON(!(s->flags & SLAB_GLOBAL_NONSENSITIVE) && + (flags & __GFP_GLOBAL_NONSENSITIVE)); + } + if (should_failslab(s, flags)) return NULL; =20 diff --git a/mm/slab_common.c b/mm/slab_common.c index e5d080a93009..72dee2494bf8 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -50,7 +50,7 @@ static DECLARE_WORK(slab_caches_to_rcu_destroy_work, SLAB_FAILSLAB | kasan_never_merge()) =20 #define SLAB_MERGE_SAME (SLAB_RECLAIM_ACCOUNT | SLAB_CACHE_DMA | \ - SLAB_CACHE_DMA32 | SLAB_ACCOUNT) + SLAB_CACHE_DMA32 | SLAB_ACCOUNT | SLAB_NONSENSITIVE) =20 /* * Merge control. If this is set then no merging of slab caches will occur. @@ -681,6 +681,15 @@ kmalloc_caches[NR_KMALLOC_TYPES][KMALLOC_SHIFT_HIGH + = 1] __ro_after_init =3D { /* initialization for https://bugs.llvm.org/show_bug.cgi?id=3D42570 */ }; EXPORT_SYMBOL(kmalloc_caches); =20 +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + +struct kmem_cache * +nonsensitive_kmalloc_caches[NR_KMALLOC_TYPES][KMALLOC_SHIFT_HIGH + 1] __ro= _after_init =3D +{ /* initialization for https://bugs.llvm.org/show_bug.cgi?id=3D42570 */ }; +EXPORT_SYMBOL(nonsensitive_kmalloc_caches); + +#endif + /* * Conversion table for small slabs sizes / 8 to the index in the * kmalloc array. This is necessary for slabs < 192 since we have non power @@ -738,25 +747,34 @@ struct kmem_cache *kmalloc_slab(size_t size, gfp_t fl= ags) index =3D fls(size - 1); } =20 - return kmalloc_caches[kmalloc_type(flags)][index]; + return get_kmalloc_cache(flags, index); } =20 +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION +#define __KMALLOC_NAME(type, base_name, sz) \ + .name[type] =3D base_name "-" #sz, \ + .nonsensitive_name[type] =3D "ns-" base_name "-" #sz, +#else +#define __KMALLOC_NAME(type, base_name, sz) \ + .name[type] =3D base_name "-" #sz, +#endif + #ifdef CONFIG_ZONE_DMA -#define KMALLOC_DMA_NAME(sz) .name[KMALLOC_DMA] =3D "dma-kmalloc-" #sz, +#define KMALLOC_DMA_NAME(sz) __KMALLOC_NAME(KMALLOC_DMA, "dma-kmalloc", sz) #else #define KMALLOC_DMA_NAME(sz) #endif =20 #ifdef CONFIG_MEMCG_KMEM -#define KMALLOC_CGROUP_NAME(sz) .name[KMALLOC_CGROUP] =3D "kmalloc-cg-" #s= z, +#define KMALLOC_CGROUP_NAME(sz) __KMALLOC_NAME(KMALLOC_CGROUP, "kmalloc-cg= ", sz) #else #define KMALLOC_CGROUP_NAME(sz) #endif =20 #define INIT_KMALLOC_INFO(__size, __short_size) \ { \ - .name[KMALLOC_NORMAL] =3D "kmalloc-" #__short_size, \ - .name[KMALLOC_RECLAIM] =3D "kmalloc-rcl-" #__short_size, \ + __KMALLOC_NAME(KMALLOC_NORMAL, "kmalloc", __short_size) \ + __KMALLOC_NAME(KMALLOC_RECLAIM, "kmalloc-rcl", __short_size) \ KMALLOC_CGROUP_NAME(__short_size) \ KMALLOC_DMA_NAME(__short_size) \ .size =3D __size, \ @@ -846,18 +864,30 @@ void __init setup_kmalloc_cache_index_table(void) static void __init new_kmalloc_cache(int idx, enum kmalloc_cache_type type, slab_flags_t flag= s) { + struct kmem_cache *(*caches)[KMALLOC_SHIFT_HIGH + 1] =3D kmalloc_caches; + const char *name =3D kmalloc_info[idx].name[type]; + +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + + if (flags & SLAB_NONSENSITIVE) { + caches =3D nonsensitive_kmalloc_caches; + name =3D kmalloc_info[idx].nonsensitive_name[type]; + } +#endif + if (type =3D=3D KMALLOC_RECLAIM) { flags |=3D SLAB_RECLAIM_ACCOUNT; } else if (IS_ENABLED(CONFIG_MEMCG_KMEM) && (type =3D=3D KMALLOC_CGROUP))= { if (cgroup_memory_nokmem) { - kmalloc_caches[type][idx] =3D kmalloc_caches[KMALLOC_NORMAL][idx]; + caches[type][idx] =3D caches[KMALLOC_NORMAL][idx]; return; } flags |=3D SLAB_ACCOUNT; + } else if (IS_ENABLED(CONFIG_ZONE_DMA) && (type =3D=3D KMALLOC_DMA)) { + flags |=3D SLAB_CACHE_DMA; } =20 - kmalloc_caches[type][idx] =3D create_kmalloc_cache( - kmalloc_info[idx].name[type], + caches[type][idx] =3D create_kmalloc_cache(name, kmalloc_info[idx].size, flags, 0, kmalloc_info[idx].size); =20 @@ -866,7 +896,7 @@ new_kmalloc_cache(int idx, enum kmalloc_cache_type type= , slab_flags_t flags) * KMALLOC_NORMAL caches. */ if (IS_ENABLED(CONFIG_MEMCG_KMEM) && (type =3D=3D KMALLOC_NORMAL)) - kmalloc_caches[type][idx]->refcount =3D -1; + caches[type][idx]->refcount =3D -1; } =20 /* @@ -908,15 +938,24 @@ void __init create_kmalloc_caches(slab_flags_t flags) for (i =3D 0; i <=3D KMALLOC_SHIFT_HIGH; i++) { struct kmem_cache *s =3D kmalloc_caches[KMALLOC_NORMAL][i]; =20 - if (s) { - kmalloc_caches[KMALLOC_DMA][i] =3D create_kmalloc_cache( - kmalloc_info[i].name[KMALLOC_DMA], - kmalloc_info[i].size, - SLAB_CACHE_DMA | flags, 0, - kmalloc_info[i].size); - } + if (s) + new_kmalloc_cache(i, KMALLOC_DMA, flags); } #endif + /* + * TODO: We may want to make slab allocations without exiting ASI. + * In that case, the cache metadata itself would need to be + * treated as non-sensitive and mapped as such, and we would need to + * do the bootstrap much more carefully. We can do that if we find + * that slab allocations while inside a restricted address space are + * frequent enough to warrant the additional complexity. + */ + if (static_asi_enabled()) + for (type =3D KMALLOC_NORMAL; type < NR_KMALLOC_TYPES; type++) + for (i =3D 0; i <=3D KMALLOC_SHIFT_HIGH; i++) + if (kmalloc_caches[type][i]) + new_kmalloc_cache(i, type, + flags | SLAB_NONSENSITIVE); } #endif /* !CONFIG_SLOB */ =20 diff --git a/security/Kconfig b/security/Kconfig index 21b15ecaf2c1..0a3e49d6a331 100644 --- a/security/Kconfig +++ b/security/Kconfig @@ -68,7 +68,7 @@ config PAGE_TABLE_ISOLATION config ADDRESS_SPACE_ISOLATION bool "Allow code to run with a reduced kernel address space" default n - depends on X86_64 && !UML + depends on X86_64 && !UML && SLAB depends on !PARAVIRT help This feature provides the ability to run some kernel code --=20 2.35.1.473.g83b2b277ed-goog From nobody Fri Jun 26 13:44:01 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 04CCAC433EF for ; Wed, 23 Feb 2022 05:25:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238336AbiBWFZm (ORCPT ); Wed, 23 Feb 2022 00:25:42 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58080 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238300AbiBWFZE (ORCPT ); Wed, 23 Feb 2022 00:25:04 -0500 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BD4156D4DC for ; Tue, 22 Feb 2022 21:24:23 -0800 (PST) Received: by mail-yb1-xb49.google.com with SMTP id m10-20020a25800a000000b0061daa5b7151so26449489ybk.10 for ; Tue, 22 Feb 2022 21:24:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=J1PJHeAg+5GsFDWdvzoAJvbqIGqngaUErLJYlufPa+k=; b=E8ylcvEY7HObuUi7YReOD7Xi+5ii//8STuZIJO/P8zS7AqqFvr59ONAjhLdHaJFCVd ANMtkHwUkTCW3M7pycyZwbgMNj908KdgtDwrheZQ6geVat7tNVM68QbPQQZm1CRl1l6l Y4ISDUE0Oi8e8zBW/PcuKqI33TayofpiZRwi9zPR+wcWJ0CNMO0p+82XCjaXWKVoLCcG P0m9VyO7WwmFPDExN5wo9Zkywpckb/NizMoYl/Y1S6i3OD/kq/dMFyBf0yRW04D/GG97 5W9at4d8tgmUGtV7LAQny8Hl7A704zcz/QnIbquS3/UrNABcFf6svGwZpVuu40Fb0cN9 xygg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=J1PJHeAg+5GsFDWdvzoAJvbqIGqngaUErLJYlufPa+k=; b=vUkCzKdjQyh9S1t/JxOtNV+K3Alz0urlRgXycXFvslac1PUcWxrODeGHoWpI3yAdUQ vPjeJb5CBOvAyiA3cQ39Evr6Goc/X30VATIFA3ErG3Ww5TFXvAAxW7EkzmwVAYa779wj qakTWkIm9DHJxGfepZyVzCyBOtOPip6YalGeRE0D70F0Nh68Ftijpt9Yt+Z+MnWwN46W yH97zLsOFSJryqfJC3gZ2WZOH0JcwWJASCZNIpiN5DeZd4thBYiEcyJSdHwTZWB4aOwq UFfQDndvCwVtTHT3oP+x2n765nvCBR0yuzN8LbiVBN0gHoUJb08mTEsY8XhA8K5MThzv kEvQ== X-Gm-Message-State: AOAM532Mx1E95oCNRNugsd5m6z//weyQ/Ybr9Bjy5OkmM598Ao+Ts50I HbpMTp0YHGlycrKY1+iPsi6GBhbxJL0kAtnxDBsMKANp2VgJZ2phyr8f0KcvnwgeGVteAplnCC5 UnJwzqg0NBFW++t3qcH44Dbt9Bfc4TOHQa1SiZC1GOPE1yK6Ke4GN6lZ0723k8bb/la8Q/irS X-Google-Smtp-Source: ABdhPJwO2zSkIt0Rgd4eSBL4j29VK3OAOmCOM+xoXLfQvXzLhs6JlCYQqS7jIhDe/nPgasCVydPPru7Y6T7c X-Received: from js-desktop.svl.corp.google.com ([2620:15c:2cd:202:ccbe:5d15:e2e6:322]) (user=junaids job=sendgmr) by 2002:a81:106:0:b0:2d0:e682:8a7a with SMTP id 6-20020a810106000000b002d0e6828a7amr27939534ywb.257.1645593854910; Tue, 22 Feb 2022 21:24:14 -0800 (PST) Date: Tue, 22 Feb 2022 21:21:49 -0800 In-Reply-To: <20220223052223.1202152-1-junaids@google.com> Message-Id: <20220223052223.1202152-14-junaids@google.com> Mime-Version: 1.0 References: <20220223052223.1202152-1-junaids@google.com> X-Mailer: git-send-email 2.35.1.473.g83b2b277ed-goog Subject: [RFC PATCH 13/47] asi: Added ASI memory cgroup flag From: Junaid Shahid To: linux-kernel@vger.kernel.org Cc: Ofir Weisse , kvm@vger.kernel.org, pbonzini@redhat.com, jmattson@google.com, pjt@google.com, alexandre.chartre@oracle.com, rppt@linux.ibm.com, dave.hansen@linux.intel.com, peterz@infradead.org, tglx@linutronix.de, luto@kernel.org, linux-mm@kvack.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Ofir Weisse Adds a cgroup flag to control if ASI is enabled for processes in that cgroup. Can be set or cleared by writing to the memory.use_asi file in the memory cgroup. The flag only affects new processes created after the flag was set. In addition to the cgroup flag, we may also want to add a per-process flag, though it will have to be something that can be set at process creation time. Signed-off-by: Ofir Weisse Co-developed-by: Junaid Shahid Signed-off-by: Junaid Shahid --- arch/x86/mm/asi.c | 14 ++++++++++++++ include/linux/memcontrol.h | 3 +++ include/linux/mm_types.h | 17 +++++++++++++++++ mm/memcontrol.c | 30 ++++++++++++++++++++++++++++++ 4 files changed, 64 insertions(+) diff --git a/arch/x86/mm/asi.c b/arch/x86/mm/asi.c index 71348399baf1..ca50a32ecd7e 100644 --- a/arch/x86/mm/asi.c +++ b/arch/x86/mm/asi.c @@ -2,6 +2,7 @@ =20 #include #include +#include =20 #include #include @@ -322,7 +323,20 @@ EXPORT_SYMBOL_GPL(asi_exit); =20 void asi_init_mm_state(struct mm_struct *mm) { + struct mem_cgroup *memcg =3D get_mem_cgroup_from_mm(mm); + memset(mm->asi, 0, sizeof(mm->asi)); + mm->asi_enabled =3D false; + + /* + * TODO: In addition to a cgroup flag, we may also want a per-process + * flag. + */ + if (memcg) { + mm->asi_enabled =3D boot_cpu_has(X86_FEATURE_ASI) && + memcg->use_asi; + css_put(&memcg->css); + } } =20 static bool is_page_within_range(size_t addr, size_t page_size, diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 0c5c403f4be6..a883cb458b06 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -259,6 +259,9 @@ struct mem_cgroup { */ bool oom_group; =20 +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + bool use_asi; +#endif /* protected by memcg_oom_lock */ bool oom_lock; int under_oom; diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 5b8028fcfe67..8624d2783661 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -607,6 +607,14 @@ struct mm_struct { * new_owner->alloc_lock is held */ struct task_struct __rcu *owner; + +#endif +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + /* Is ASI enabled for this mm? ASI requires allocating ext= ra + * resources, such as ASI page tables. To prevent allocati= ong + * these resources for every mm in the system, we expect t= hat + * only VM mm's will have this flag set. */ + bool asi_enabled; #endif struct user_namespace *user_ns; =20 @@ -665,6 +673,15 @@ struct mm_struct { =20 extern struct mm_struct init_mm; =20 +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION +static inline bool mm_asi_enabled(struct mm_struct *mm) +{ + return mm->asi_enabled; +} +#else +static inline bool mm_asi_enabled(struct mm_struct *mm) { return false; } +#endif + /* Pointer magic because the dynamic array size confuses some compilers. */ static inline void mm_init_cpumask(struct mm_struct *mm) { diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 2ed5f2a0879d..a66d6b222ecf 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -3539,6 +3539,29 @@ static int mem_cgroup_hierarchy_write(struct cgroup_= subsys_state *css, return -EINVAL; } =20 +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + +static u64 mem_cgroup_asi_read(struct cgroup_subsys_state *css, + struct cftype *cft) +{ + return mem_cgroup_from_css(css)->use_asi; +} + +static int mem_cgroup_asi_write(struct cgroup_subsys_state *css, + struct cftype *cft, u64 val) +{ + struct mem_cgroup *memcg =3D mem_cgroup_from_css(css); + + if (val =3D=3D 1 || val =3D=3D 0) + memcg->use_asi =3D val; + else + return -EINVAL; + + return 0; +} + +#endif + static unsigned long mem_cgroup_usage(struct mem_cgroup *memcg, bool swap) { unsigned long val; @@ -4888,6 +4911,13 @@ static struct cftype mem_cgroup_legacy_files[] =3D { .write_u64 =3D mem_cgroup_hierarchy_write, .read_u64 =3D mem_cgroup_hierarchy_read, }, +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + { + .name =3D "use_asi", + .write_u64 =3D mem_cgroup_asi_write, + .read_u64 =3D mem_cgroup_asi_read, + }, +#endif { .name =3D "cgroup.event_control", /* XXX: for compat */ .write =3D memcg_write_event_control, --=20 2.35.1.473.g83b2b277ed-goog From nobody Fri Jun 26 13:44:01 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C8E19C433F5 for ; Wed, 23 Feb 2022 05:25:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238271AbiBWF0F (ORCPT ); Wed, 23 Feb 2022 00:26:05 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57948 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238226AbiBWFZf (ORCPT ); Wed, 23 Feb 2022 00:25:35 -0500 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5C2ED6AA56 for ; Tue, 22 Feb 2022 21:24:31 -0800 (PST) Received: by mail-yb1-xb49.google.com with SMTP id k7-20020a255607000000b00621afc793b8so26767243ybb.1 for ; Tue, 22 Feb 2022 21:24:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=ObwiCZnkzVE8MG/ZKfaaMbJx1EDS+iQex1h+i9r1lWc=; b=gZ9xDnCTT9J2il7FQZjHDQXU7P24qq79E3XEa26PGLeqDS7nHRklN55Z0YrbQllfvk a54Tnc7QS05X1TZviBvKI6s2J4e0JaxU2Mc4n2JddMKt6uhs1bA5ThP/H5jQeUIa+Q5L YzXtSBRLRL23nYFayWpilSuGrgyaEtzeVJtrSMs+fJOYlGhJTGeNFUiGfho3OKAJCQvL xqhDtH8EyJDw3FY1cTPmzwnJaSBjdeIgm+usm4Xbnvmu4zwwAyO/ya/YpXR5JHhjT9H0 Wpbgj5s5UiyqBd60tJZ8fxwBXyX5DCw1p9aGG11pd0ckG2SC7DJ9q4R1uxUVnHb2V4YR +hNg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=ObwiCZnkzVE8MG/ZKfaaMbJx1EDS+iQex1h+i9r1lWc=; b=STXMqOZ2fnkcMaAIwO6uZ7xOxuFUZ7HT3oUA/+NKyED/khOBjdxeM8eZMW+mviHxsv aOZ3643VLS8C9qkblxbsApK6ARkw+H3xmk8YiLhuGIAN7F+2WCq97RMrSrWdWMQbsIcW +PG9n4yLPIa1Cu3RKsBUxbEyGxUWdy7Xn2SZq+KGxiOfojgq4CvdZPwC/HVuazOgEab7 vsrLAwNya4ulqkyQJDrfL6L5lSpzfL0npUqfdv8smNyZ2lDHkaY3xM5IIshnsMOAqCrU Gcj8Ipkpg3n+DNZzgcUUj3pr810QSYG0ocxG3OyTjtW1gDs0qyzFHkI6OIpwKPRoratS QNXA== X-Gm-Message-State: AOAM530Zo72AAbdVrLPs1WWY/w7+cYvhfCFoDlxJHancSLJrhwdp/zm9 B6yrVnce/mbQ2XSpQIWcFZiNZ/jJFeKB4WBLcvNbXqVzfQoH1+EYbF/qiaZ1r94f+Udn6LTdvZ5 baEAEC78kpRhCf5dSI5odMyFwQwGSv7+PGu4NNsvn/3ecxMSqBMJz/mDhKBwjWnEDHbx4I5lu X-Google-Smtp-Source: ABdhPJwnMSRMCv1r9WCSdtorggC4aqKc4d1lpyRHc4R/HT3Ens8gLxTDNQT3s1kJiX6qfOat6YB18uabGISu X-Received: from js-desktop.svl.corp.google.com ([2620:15c:2cd:202:ccbe:5d15:e2e6:322]) (user=junaids job=sendgmr) by 2002:a0d:db0d:0:b0:2d0:e912:3e47 with SMTP id d13-20020a0ddb0d000000b002d0e9123e47mr27008531ywe.23.1645593857064; Tue, 22 Feb 2022 21:24:17 -0800 (PST) Date: Tue, 22 Feb 2022 21:21:50 -0800 In-Reply-To: <20220223052223.1202152-1-junaids@google.com> Message-Id: <20220223052223.1202152-15-junaids@google.com> Mime-Version: 1.0 References: <20220223052223.1202152-1-junaids@google.com> X-Mailer: git-send-email 2.35.1.473.g83b2b277ed-goog Subject: [RFC PATCH 14/47] mm: asi: Disable ASI API when ASI is not enabled for a process From: Junaid Shahid To: linux-kernel@vger.kernel.org Cc: kvm@vger.kernel.org, pbonzini@redhat.com, jmattson@google.com, pjt@google.com, oweisse@google.com, alexandre.chartre@oracle.com, rppt@linux.ibm.com, dave.hansen@linux.intel.com, peterz@infradead.org, tglx@linutronix.de, luto@kernel.org, linux-mm@kvack.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" If ASI is not enabled for a process, then asi_init() will return a NULL ASI pointer as output, though it will return a 0 error code. All other ASI API functions will return without an error when they get a NULL ASI pointer. Signed-off-by: Junaid Shahid --- arch/x86/include/asm/asi.h | 2 +- arch/x86/mm/asi.c | 18 ++++++++++-------- include/asm-generic/asi.h | 7 ++++++- 3 files changed, 17 insertions(+), 10 deletions(-) diff --git a/arch/x86/include/asm/asi.h b/arch/x86/include/asm/asi.h index 64c2b4d1dba2..f69e1f2f09a4 100644 --- a/arch/x86/include/asm/asi.h +++ b/arch/x86/include/asm/asi.h @@ -51,7 +51,7 @@ int asi_register_class(const char *name, uint flags, const struct asi_hooks *ops); void asi_unregister_class(int index); =20 -int asi_init(struct mm_struct *mm, int asi_index); +int asi_init(struct mm_struct *mm, int asi_index, struct asi **out_asi); void asi_destroy(struct asi *asi); =20 void asi_enter(struct asi *asi); diff --git a/arch/x86/mm/asi.c b/arch/x86/mm/asi.c index ca50a32ecd7e..58d1c532274a 100644 --- a/arch/x86/mm/asi.c +++ b/arch/x86/mm/asi.c @@ -207,11 +207,13 @@ static int __init asi_global_init(void) } subsys_initcall(asi_global_init) =20 -int asi_init(struct mm_struct *mm, int asi_index) +int asi_init(struct mm_struct *mm, int asi_index, struct asi **out_asi) { struct asi *asi =3D &mm->asi[asi_index]; =20 - if (!boot_cpu_has(X86_FEATURE_ASI)) + *out_asi =3D NULL; + + if (!boot_cpu_has(X86_FEATURE_ASI) || !mm->asi_enabled) return 0; =20 /* Index 0 is reserved for special purposes. */ @@ -238,13 +240,15 @@ int asi_init(struct mm_struct *mm, int asi_index) set_pgd(asi->pgd + i, asi_global_nonsensitive_pgd[i]); } =20 + *out_asi =3D asi; + return 0; } EXPORT_SYMBOL_GPL(asi_init); =20 void asi_destroy(struct asi *asi) { - if (!boot_cpu_has(X86_FEATURE_ASI)) + if (!boot_cpu_has(X86_FEATURE_ASI) || !asi) return; =20 asi_free_pgd(asi); @@ -278,11 +282,9 @@ void __asi_enter(void) =20 void asi_enter(struct asi *asi) { - if (!static_cpu_has(X86_FEATURE_ASI)) + if (!static_cpu_has(X86_FEATURE_ASI) || !asi) return; =20 - VM_WARN_ON_ONCE(!asi); - this_cpu_write(asi_cpu_state.target_asi, asi); barrier(); =20 @@ -423,7 +425,7 @@ int asi_map_gfp(struct asi *asi, void *addr, size_t len= , gfp_t gfp_flags) size_t end =3D start + len; size_t page_size; =20 - if (!static_cpu_has(X86_FEATURE_ASI)) + if (!static_cpu_has(X86_FEATURE_ASI) || !asi) return 0; =20 VM_BUG_ON(start & ~PAGE_MASK); @@ -514,7 +516,7 @@ void asi_unmap(struct asi *asi, void *addr, size_t len,= bool flush_tlb) size_t end =3D start + len; pgtbl_mod_mask mask =3D 0; =20 - if (!static_cpu_has(X86_FEATURE_ASI) || !len) + if (!static_cpu_has(X86_FEATURE_ASI) || !asi || !len) return; =20 VM_BUG_ON(start & ~PAGE_MASK); diff --git a/include/asm-generic/asi.h b/include/asm-generic/asi.h index f918cd052722..51c9c4a488e8 100644 --- a/include/asm-generic/asi.h +++ b/include/asm-generic/asi.h @@ -33,7 +33,12 @@ static inline void asi_unregister_class(int asi_index) {= } =20 static inline void asi_init_mm_state(struct mm_struct *mm) { } =20 -static inline int asi_init(struct mm_struct *mm, int asi_index) { return 0= ; } +static inline +int asi_init(struct mm_struct *mm, int asi_index, struct asi **out_asi) +{ + *out_asi =3D NULL; + return 0; +} =20 static inline void asi_destroy(struct asi *asi) { } =20 --=20 2.35.1.473.g83b2b277ed-goog From nobody Fri Jun 26 13:44:01 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0FE81C433FE for ; Wed, 23 Feb 2022 05:25:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238234AbiBWFZu (ORCPT ); Wed, 23 Feb 2022 00:25:50 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56902 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238253AbiBWFZg (ORCPT ); Wed, 23 Feb 2022 00:25:36 -0500 Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com [IPv6:2607:f8b0:4864:20::114a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0188E694AF for ; Tue, 22 Feb 2022 21:24:32 -0800 (PST) Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-2d6b6cf0cafso150368717b3.21 for ; Tue, 22 Feb 2022 21:24:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=CYYrqCq/zRkZaGiiFDeL2WZqXbaiwLweAdvv3KPw30Q=; b=gttTspcTbeOxem9vIHTEp8HxBKLNL6kOKKunFMomQY7gQjAmmA7zLLKeW3pD5BHCG4 n+cavMcNG3z81q2MsF1OUm8LD7b7lNc/CiICe2jeX2JuESrjhWO3zBSCqlZo0v8ri/13 wSpPkKBkIUU4Lon1cqB6YDkf8UKIBOe6n1yOTYHkccD+ZHQbtAOffPifXTCZr0f1Etmj 7oxfXxlKJ56Y/Q570Q0HNRmXemUr5RxMgLdTEadmhWn3pzJBdmkD+yWBtTx1hkL9PZjC s4h9v+aGt/9IsWDNJnCTYuoYPrr3fqgrmkUg8lETXm3ikW63y2HAoRDcoo7ArQV+H7ZF BqDg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=CYYrqCq/zRkZaGiiFDeL2WZqXbaiwLweAdvv3KPw30Q=; b=hYnqSFLGhx8jI1Sjr0hu39cFXpyPvorif9WzNMcZtNousTVoq/ZVKzZ+hUgcvc6o4b b2YDI2SO1kgRtGQJE8MmsQpkSymwJRDfKQlEp/FUvgupfANDgVuo2mijTBjV25dpsQ94 VnvvKJ4t2x+iM7kLfPgGtpqJJ1APCy0+oEt+KGpEkVoikyDv2k6mSt4ZcfuYLEQgwn79 FWX5uGRCcL4Ynh+sFRPZBamsyBvNl1minS7ZibpWITV+d0oEEkKEe+TqC6RKWH3j1ARc QyvR/wtiTM55EltOB+1Y1EU6DZpAXFgOiAzyYTVeT38PErA1g37i6zWvS048V6xW4nYf pXGg== X-Gm-Message-State: AOAM530+N7BzXXMczdaErrOwd3DanOWENIQH1hzGYaA6+VCMJmwkDaP/ UscExBXmFi9LWo9QZnTgcBuv6ESrwYAuhXq9K9CVlKxtuou8EeaDk42em+jGtHOwOsiyrDp3zp8 Del6ak6EP3VGR32dmWcd4UfPt/3lm9Ob1QVSxSpMMlEwzXGjzICUtjbl3x1/f9aSboJ1j8bfz X-Google-Smtp-Source: ABdhPJyHSwIU1ypDJg3UAIk4OCagkwsYrp4hfbAB5svLD0KoToAB7kgIvdsaPCw4R5P2Bj++u4Uv/BXKWrDE X-Received: from js-desktop.svl.corp.google.com ([2620:15c:2cd:202:ccbe:5d15:e2e6:322]) (user=junaids job=sendgmr) by 2002:a81:9842:0:b0:2cb:86f2:560d with SMTP id p63-20020a819842000000b002cb86f2560dmr27884434ywg.375.1645593859219; Tue, 22 Feb 2022 21:24:19 -0800 (PST) Date: Tue, 22 Feb 2022 21:21:51 -0800 In-Reply-To: <20220223052223.1202152-1-junaids@google.com> Message-Id: <20220223052223.1202152-16-junaids@google.com> Mime-Version: 1.0 References: <20220223052223.1202152-1-junaids@google.com> X-Mailer: git-send-email 2.35.1.473.g83b2b277ed-goog Subject: [RFC PATCH 15/47] kvm: asi: Restricted address space for VM execution From: Junaid Shahid To: linux-kernel@vger.kernel.org Cc: kvm@vger.kernel.org, pbonzini@redhat.com, jmattson@google.com, pjt@google.com, oweisse@google.com, alexandre.chartre@oracle.com, rppt@linux.ibm.com, dave.hansen@linux.intel.com, peterz@infradead.org, tglx@linutronix.de, luto@kernel.org, linux-mm@kvack.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" An ASI restricted address space is added for KVM. It is currently only enabled for Intel CPUs. The ASI hooks have been setup to do an L1D cache flush and MDS clear when entering the restricted address space. The hooks are also meant to stun and unstun the sibling hyperthread when exiting and entering the restricted address space. Internally, we do have a full stunning implementation available, but it hasn't yet been determined whether it is fully compatible with the upstream core scheduling implementation, so it is not included in this patch series and instead this patch just includes corresponding stub functions to demonstrate where the stun/unstun would happen. Signed-off-by: Junaid Shahid --- arch/x86/include/asm/kvm_host.h | 2 + arch/x86/kvm/vmx/vmx.c | 41 ++++++++++++----- arch/x86/kvm/x86.c | 81 ++++++++++++++++++++++++++++++++- include/linux/kvm_host.h | 2 + 4 files changed, 113 insertions(+), 13 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index 555f4de47ef2..98cbd6447e3e 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1494,6 +1494,8 @@ struct kvm_x86_ops { int (*complete_emulated_msr)(struct kvm_vcpu *vcpu, int err); =20 void (*vcpu_deliver_sipi_vector)(struct kvm_vcpu *vcpu, u8 vector); + + void (*flush_sensitive_cpu_state)(struct kvm_vcpu *vcpu); }; =20 struct kvm_x86_nested_ops { diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 0dbf94eb954f..e0178b57be75 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -47,6 +47,7 @@ #include #include #include +#include =20 #include "capabilities.h" #include "cpuid.h" @@ -300,7 +301,7 @@ static int vmx_setup_l1d_flush(enum vmx_l1d_flush_state= l1tf) else static_branch_disable(&vmx_l1d_should_flush); =20 - if (l1tf =3D=3D VMENTER_L1D_FLUSH_COND) + if (l1tf =3D=3D VMENTER_L1D_FLUSH_COND && !boot_cpu_has(X86_FEATURE_ASI)) static_branch_enable(&vmx_l1d_flush_cond); else static_branch_disable(&vmx_l1d_flush_cond); @@ -6079,6 +6080,8 @@ static noinstr void vmx_l1d_flush(struct kvm_vcpu *vc= pu) if (static_branch_likely(&vmx_l1d_flush_cond)) { bool flush_l1d; =20 + VM_BUG_ON(vcpu->kvm->asi); + /* * Clear the per-vcpu flush bit, it gets set again * either from vcpu_run() or from one of the unsafe @@ -6590,16 +6593,31 @@ static fastpath_t vmx_exit_handlers_fastpath(struct= kvm_vcpu *vcpu) } } =20 -static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu, - struct vcpu_vmx *vmx) +static void vmx_flush_sensitive_cpu_state(struct kvm_vcpu *vcpu) { - kvm_guest_enter_irqoff(); - /* L1D Flush includes CPU buffer clear to mitigate MDS */ if (static_branch_unlikely(&vmx_l1d_should_flush)) vmx_l1d_flush(vcpu); else if (static_branch_unlikely(&mds_user_clear)) mds_clear_cpu_buffers(); +} + +static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu, + struct vcpu_vmx *vmx) +{ + unsigned long cr3; + + kvm_guest_enter_irqoff(); + + vmx_flush_sensitive_cpu_state(vcpu); + + asi_enter(vcpu->kvm->asi); + + cr3 =3D __get_current_cr3_fast(); + if (unlikely(cr3 !=3D vmx->loaded_vmcs->host_state.cr3)) { + vmcs_writel(HOST_CR3, cr3); + vmx->loaded_vmcs->host_state.cr3 =3D cr3; + } =20 if (vcpu->arch.cr2 !=3D native_read_cr2()) native_write_cr2(vcpu->arch.cr2); @@ -6609,13 +6627,16 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_= vcpu *vcpu, =20 vcpu->arch.cr2 =3D native_read_cr2(); =20 + VM_WARN_ON_ONCE(vcpu->kvm->asi && !is_asi_active()); + asi_set_target_unrestricted(); + kvm_guest_exit_irqoff(); } =20 static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu) { struct vcpu_vmx *vmx =3D to_vmx(vcpu); - unsigned long cr3, cr4; + unsigned long cr4; =20 /* Record the guest's net vcpu time for enforced NMI injections. */ if (unlikely(!enable_vnmi && @@ -6657,12 +6678,6 @@ static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu) if (kvm_register_is_dirty(vcpu, VCPU_REGS_RIP)) vmcs_writel(GUEST_RIP, vcpu->arch.regs[VCPU_REGS_RIP]); =20 - cr3 =3D __get_current_cr3_fast(); - if (unlikely(cr3 !=3D vmx->loaded_vmcs->host_state.cr3)) { - vmcs_writel(HOST_CR3, cr3); - vmx->loaded_vmcs->host_state.cr3 =3D cr3; - } - cr4 =3D cr4_read_shadow(); if (unlikely(cr4 !=3D vmx->loaded_vmcs->host_state.cr4)) { vmcs_writel(HOST_CR4, cr4); @@ -7691,6 +7706,8 @@ static struct kvm_x86_ops vmx_x86_ops __initdata =3D { .complete_emulated_msr =3D kvm_complete_insn_gp, =20 .vcpu_deliver_sipi_vector =3D kvm_vcpu_deliver_sipi_vector, + + .flush_sensitive_cpu_state =3D vmx_flush_sensitive_cpu_state, }; =20 static __init void vmx_setup_user_return_msrs(void) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index e50e97ac4408..dd07f677d084 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -81,6 +81,7 @@ #include #include #include +#include =20 #define CREATE_TRACE_POINTS #include "trace.h" @@ -297,6 +298,8 @@ EXPORT_SYMBOL_GPL(supported_xcr0); =20 static struct kmem_cache *x86_emulator_cache; =20 +static int __read_mostly kvm_asi_index; + /* * When called, it means the previous get/set msr reached an invalid msr. * Return true if we want to ignore/silent this failed msr access. @@ -8620,6 +8623,50 @@ static struct notifier_block pvclock_gtod_notifier = =3D { }; #endif =20 +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + +/* + * We have an HT-stunning implementation available internally, + * but it is yet to be determined if it is fully compatible with the + * upstream core scheduling implementation. So leaving it out for now + * and just leaving these stubs here. + */ +static void stun_sibling(void) { } +static void unstun_sibling(void) { } + +/* + * This function must be fully re-entrant and idempotent. + * Though the idempotency requirement could potentially be relaxed for stu= ff + * like stats where complete accuracy is not needed. + */ +static void kvm_pre_asi_exit(void) +{ + stun_sibling(); +} + +/* + * This function must be fully re-entrant and idempotent. + * Though the idempotency requirement could potentially be relaxed for stu= ff + * like stats where complete accuracy is not needed. + */ +static void kvm_post_asi_enter(void) +{ + struct kvm_vcpu *vcpu =3D raw_cpu_read(*kvm_get_running_vcpus()); + + kvm_x86_ops.flush_sensitive_cpu_state(vcpu); + + unstun_sibling(); +} + +#endif + +static const struct asi_hooks kvm_asi_hooks =3D { +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + .pre_asi_exit =3D kvm_pre_asi_exit, + .post_asi_enter =3D kvm_post_asi_enter +#endif +}; + int kvm_arch_init(void *opaque) { struct kvm_x86_init_ops *ops =3D opaque; @@ -8674,6 +8721,15 @@ int kvm_arch_init(void *opaque) if (r) goto out_free_percpu; =20 + if (ops->runtime_ops->flush_sensitive_cpu_state) { + r =3D asi_register_class("KVM", ASI_MAP_STANDARD_NONSENSITIVE, + &kvm_asi_hooks); + if (r < 0) + goto out_mmu_exit; + + kvm_asi_index =3D r; + } + kvm_timer_init(); =20 perf_register_guest_info_callbacks(&kvm_guest_cbs); @@ -8694,6 +8750,8 @@ int kvm_arch_init(void *opaque) =20 return 0; =20 +out_mmu_exit: + kvm_mmu_module_exit(); out_free_percpu: free_percpu(user_return_msrs); out_free_x86_emulator_cache: @@ -8720,6 +8778,11 @@ void kvm_arch_exit(void) irq_work_sync(&pvclock_irq_work); cancel_work_sync(&pvclock_gtod_work); #endif + if (kvm_asi_index > 0) { + asi_unregister_class(kvm_asi_index); + kvm_asi_index =3D 0; + } + kvm_x86_ops.hardware_enable =3D NULL; kvm_mmu_module_exit(); free_percpu(user_return_msrs); @@ -11391,11 +11454,26 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned lo= ng type) INIT_DELAYED_WORK(&kvm->arch.kvmclock_sync_work, kvmclock_sync_fn); =20 kvm_apicv_init(kvm); + + if (kvm_asi_index > 0) { + ret =3D asi_init(kvm->mm, kvm_asi_index, &kvm->asi); + if (ret) + goto error; + } + kvm_hv_init_vm(kvm); kvm_mmu_init_vm(kvm); kvm_xen_init_vm(kvm); =20 - return static_call(kvm_x86_vm_init)(kvm); + ret =3D static_call(kvm_x86_vm_init)(kvm); + if (ret) + goto error; + + return 0; +error: + kvm_page_track_cleanup(kvm); + asi_destroy(kvm->asi); + return ret; } =20 int kvm_arch_post_init_vm(struct kvm *kvm) @@ -11549,6 +11627,7 @@ void kvm_arch_destroy_vm(struct kvm *kvm) kvm_page_track_cleanup(kvm); kvm_xen_destroy_vm(kvm); kvm_hv_destroy_vm(kvm); + asi_destroy(kvm->asi); } =20 static void memslot_rmap_free(struct kvm_memory_slot *slot) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index c310648cc8f1..9dd63ed21f75 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -38,6 +38,7 @@ =20 #include #include +#include =20 #ifndef KVM_MAX_VCPU_IDS #define KVM_MAX_VCPU_IDS KVM_MAX_VCPUS @@ -551,6 +552,7 @@ struct kvm { */ struct mutex slots_arch_lock; struct mm_struct *mm; /* userspace tied to this vm */ + struct asi *asi; struct kvm_memslots __rcu *memslots[KVM_ADDRESS_SPACE_NUM]; struct kvm_vcpu *vcpus[KVM_MAX_VCPUS]; =20 --=20 2.35.1.473.g83b2b277ed-goog From nobody Fri Jun 26 13:44:01 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id BB4F7C433FE for ; Wed, 23 Feb 2022 05:25:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238223AbiBWFZ4 (ORCPT ); Wed, 23 Feb 2022 00:25:56 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57798 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238264AbiBWFZh (ORCPT ); Wed, 23 Feb 2022 00:25:37 -0500 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4497A6C1E5 for ; Tue, 22 Feb 2022 21:24:36 -0800 (PST) Received: by mail-yb1-xb4a.google.com with SMTP id b18-20020a25fa12000000b0062412a8200eso20208494ybe.22 for ; Tue, 22 Feb 2022 21:24:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=aBDkmRlW2wO+VZF+yuNfjgMBKgOQnmUW3eUrwjZrCS8=; b=YOGSR3d5gsNbX6iUzwto0/6/twFuZLdwqQ+OyGBmDQ+Y0Z9tIxowj80ZJNXhvJxX+Y 39mdIpZnjCbsNOAMZ3WcDDj7mi8QZH0c/RsyMzdcAwnc6PykioINuQH19G0EC5hHlDik 23zXjrmfS9wbxjdu3O5ye1/Ud1PUIpPDZK7t9Nl7PN6MOX3LEYFZ3t9yeI4RY0EL+Zno WAZ3yPIe48Tjd2EqQRhcT9O3QomAL8NNi0J1Bk6EaY/8ic2RRlHQl+dYmmLeJ/OqUWOZ mOxvcqRrmNjCUbbMYTjvZgG5UsBId8RgCT1I73bDH8qdngvHAJMKuao3IUJPZHo/KDG+ YImA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=aBDkmRlW2wO+VZF+yuNfjgMBKgOQnmUW3eUrwjZrCS8=; b=YXmPSGlcNv+hT8JaWw9FelCtB55Waghw+iU/BbQ+V26ZxzhkLCKO5WH0+60nbilzlz NVQJIJ/EhXEloSOqj3V9uPJL+QuhzLeiG4KHvZBjjneihfsGans24NNiPRjF9boYVnQq vtJ+mpAwFz1yfjCqB3uhFWoeS1alxGfvmcUStHRBniOj58G9/PzZjic4QHKHiedOZaSw iZ51VEzCAZpBKN4KxXxKy05g41VUKs+T038wyFKGhbCiXGPjx47/Oq8HGlyQXLq0z63Y MhXfKV8FBbXNuRuV48Phikb5tVzEBlMYuFmCdAla/kx7lfv0r6asqB/c+zqyEhg2bXIe 859w== X-Gm-Message-State: AOAM533snVNRFQoeEWhLlWN5IXe/Vv+yOlBGCXZN3BBnoLwS3TYKucO5 fs6sgULy6Y0vZBG25n7wUGxc62wy2z5uZQCifXOhpA5miUflzOyOs8Kt+7SlqV4oR2y0Ajr2Ejj TlLAfTzGqcpf5PRdrQilgXjxdtgECZszV41/W8d0cGlNMvbw1irGX05ses07yIR8xhDq9QlY2 X-Google-Smtp-Source: ABdhPJxviHHfMBTme5NCgrk3enqRX6Pa+N3RjWDQxUKeUW0rsQhJHGhPnnpDhjrkSitIn82U+v9xIBog7t0o X-Received: from js-desktop.svl.corp.google.com ([2620:15c:2cd:202:ccbe:5d15:e2e6:322]) (user=junaids job=sendgmr) by 2002:a81:7489:0:b0:2d1:518:8c57 with SMTP id p131-20020a817489000000b002d105188c57mr27268814ywc.69.1645593861309; Tue, 22 Feb 2022 21:24:21 -0800 (PST) Date: Tue, 22 Feb 2022 21:21:52 -0800 In-Reply-To: <20220223052223.1202152-1-junaids@google.com> Message-Id: <20220223052223.1202152-17-junaids@google.com> Mime-Version: 1.0 References: <20220223052223.1202152-1-junaids@google.com> X-Mailer: git-send-email 2.35.1.473.g83b2b277ed-goog Subject: [RFC PATCH 16/47] mm: asi: Support for mapping non-sensitive pcpu chunks From: Junaid Shahid To: linux-kernel@vger.kernel.org Cc: kvm@vger.kernel.org, pbonzini@redhat.com, jmattson@google.com, pjt@google.com, oweisse@google.com, alexandre.chartre@oracle.com, rppt@linux.ibm.com, dave.hansen@linux.intel.com, peterz@infradead.org, tglx@linutronix.de, luto@kernel.org, linux-mm@kvack.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This adds support for mapping and unmapping dynamic percpu chunks as globally non-sensitive. A later patch will modify the percpu allocator to use this for dynamically allocating non-sensitive percpu memory. Signed-off-by: Junaid Shahid --- include/linux/vmalloc.h | 4 ++-- mm/percpu-vm.c | 51 +++++++++++++++++++++++++++++++++-------- mm/vmalloc.c | 17 ++++++++++---- security/Kconfig | 2 +- 4 files changed, 58 insertions(+), 16 deletions(-) diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h index c7c66decda3e..5f85690f27b6 100644 --- a/include/linux/vmalloc.h +++ b/include/linux/vmalloc.h @@ -260,14 +260,14 @@ extern __init void vm_area_register_early(struct vm_s= truct *vm, size_t align); # ifdef CONFIG_MMU struct vm_struct **pcpu_get_vm_areas(const unsigned long *offsets, const size_t *sizes, int nr_vms, - size_t align); + size_t align, ulong flags); =20 void pcpu_free_vm_areas(struct vm_struct **vms, int nr_vms); # else static inline struct vm_struct ** pcpu_get_vm_areas(const unsigned long *offsets, const size_t *sizes, int nr_vms, - size_t align) + size_t align, ulong flags) { return NULL; } diff --git a/mm/percpu-vm.c b/mm/percpu-vm.c index 2054c9213c43..5579a96ad782 100644 --- a/mm/percpu-vm.c +++ b/mm/percpu-vm.c @@ -153,8 +153,12 @@ static void __pcpu_unmap_pages(unsigned long addr, int= nr_pages) static void pcpu_unmap_pages(struct pcpu_chunk *chunk, struct page **pages, int page_start, int page_end) { + struct vm_struct **vms =3D (struct vm_struct **)chunk->data; unsigned int cpu; int i; + ulong addr, nr_pages; + + nr_pages =3D page_end - page_start; =20 for_each_possible_cpu(cpu) { for (i =3D page_start; i < page_end; i++) { @@ -164,8 +168,14 @@ static void pcpu_unmap_pages(struct pcpu_chunk *chunk, WARN_ON(!page); pages[pcpu_page_idx(cpu, i)] =3D page; } - __pcpu_unmap_pages(pcpu_chunk_addr(chunk, cpu, page_start), - page_end - page_start); + addr =3D pcpu_chunk_addr(chunk, cpu, page_start); + + /* TODO: We should batch the TLB flushes */ + if (vms[0]->flags & VM_GLOBAL_NONSENSITIVE) + asi_unmap(ASI_GLOBAL_NONSENSITIVE, (void *)addr, + nr_pages * PAGE_SIZE, true); + + __pcpu_unmap_pages(addr, nr_pages); } } =20 @@ -212,18 +222,30 @@ static int __pcpu_map_pages(unsigned long addr, struc= t page **pages, * reverse lookup (addr -> chunk). */ static int pcpu_map_pages(struct pcpu_chunk *chunk, - struct page **pages, int page_start, int page_end) + struct page **pages, int page_start, int page_end, + gfp_t gfp) { unsigned int cpu, tcpu; int i, err; + ulong addr, nr_pages; + + nr_pages =3D page_end - page_start; =20 for_each_possible_cpu(cpu) { - err =3D __pcpu_map_pages(pcpu_chunk_addr(chunk, cpu, page_start), + addr =3D pcpu_chunk_addr(chunk, cpu, page_start); + err =3D __pcpu_map_pages(addr, &pages[pcpu_page_idx(cpu, page_start)], - page_end - page_start); + nr_pages); if (err < 0) goto err; =20 + if (gfp & __GFP_GLOBAL_NONSENSITIVE) { + err =3D asi_map(ASI_GLOBAL_NONSENSITIVE, (void *)addr, + nr_pages * PAGE_SIZE); + if (err) + goto err; + } + for (i =3D page_start; i < page_end; i++) pcpu_set_page_chunk(pages[pcpu_page_idx(cpu, i)], chunk); @@ -231,10 +253,15 @@ static int pcpu_map_pages(struct pcpu_chunk *chunk, return 0; err: for_each_possible_cpu(tcpu) { + addr =3D pcpu_chunk_addr(chunk, tcpu, page_start); + + if (gfp & __GFP_GLOBAL_NONSENSITIVE) + asi_unmap(ASI_GLOBAL_NONSENSITIVE, (void *)addr, + nr_pages * PAGE_SIZE, false); + + __pcpu_unmap_pages(addr, nr_pages); if (tcpu =3D=3D cpu) break; - __pcpu_unmap_pages(pcpu_chunk_addr(chunk, tcpu, page_start), - page_end - page_start); } pcpu_post_unmap_tlb_flush(chunk, page_start, page_end); return err; @@ -285,7 +312,7 @@ static int pcpu_populate_chunk(struct pcpu_chunk *chunk, if (pcpu_alloc_pages(chunk, pages, page_start, page_end, gfp)) return -ENOMEM; =20 - if (pcpu_map_pages(chunk, pages, page_start, page_end)) { + if (pcpu_map_pages(chunk, pages, page_start, page_end, gfp)) { pcpu_free_pages(chunk, pages, page_start, page_end); return -ENOMEM; } @@ -334,13 +361,19 @@ static struct pcpu_chunk *pcpu_create_chunk(gfp_t gfp) { struct pcpu_chunk *chunk; struct vm_struct **vms; + ulong vm_flags =3D 0; + + if (static_asi_enabled() && (gfp & __GFP_GLOBAL_NONSENSITIVE)) + vm_flags =3D VM_GLOBAL_NONSENSITIVE; + + gfp &=3D ~__GFP_GLOBAL_NONSENSITIVE; =20 chunk =3D pcpu_alloc_chunk(gfp); if (!chunk) return NULL; =20 vms =3D pcpu_get_vm_areas(pcpu_group_offsets, pcpu_group_sizes, - pcpu_nr_groups, pcpu_atom_size); + pcpu_nr_groups, pcpu_atom_size, vm_flags); if (!vms) { pcpu_free_chunk(chunk); return NULL; diff --git a/mm/vmalloc.c b/mm/vmalloc.c index ba588a37ee75..f13bfe7e896b 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -3664,10 +3664,10 @@ pvm_determine_end_from_reverse(struct vmap_area **v= a, unsigned long align) */ struct vm_struct **pcpu_get_vm_areas(const unsigned long *offsets, const size_t *sizes, int nr_vms, - size_t align) + size_t align, ulong flags) { - const unsigned long vmalloc_start =3D ALIGN(VMALLOC_START, align); - const unsigned long vmalloc_end =3D VMALLOC_END & ~(align - 1); + unsigned long vmalloc_start =3D VMALLOC_START; + unsigned long vmalloc_end =3D VMALLOC_END; struct vmap_area **vas, *va; struct vm_struct **vms; int area, area2, last_area, term_area; @@ -3677,6 +3677,15 @@ struct vm_struct **pcpu_get_vm_areas(const unsigned = long *offsets, =20 /* verify parameters and allocate data structures */ BUG_ON(offset_in_page(align) || !is_power_of_2(align)); + + if (static_asi_enabled() && (flags & VM_GLOBAL_NONSENSITIVE)) { + vmalloc_start =3D VMALLOC_GLOBAL_NONSENSITIVE_START; + vmalloc_end =3D VMALLOC_GLOBAL_NONSENSITIVE_END; + } + + vmalloc_start =3D ALIGN(vmalloc_start, align); + vmalloc_end =3D vmalloc_end & ~(align - 1); + for (last_area =3D 0, area =3D 0; area < nr_vms; area++) { start =3D offsets[area]; end =3D start + sizes[area]; @@ -3815,7 +3824,7 @@ struct vm_struct **pcpu_get_vm_areas(const unsigned l= ong *offsets, for (area =3D 0; area < nr_vms; area++) { insert_vmap_area(vas[area], &vmap_area_root, &vmap_area_list); =20 - setup_vmalloc_vm_locked(vms[area], vas[area], VM_ALLOC, + setup_vmalloc_vm_locked(vms[area], vas[area], flags | VM_ALLOC, pcpu_get_vm_areas); } spin_unlock(&vmap_area_lock); diff --git a/security/Kconfig b/security/Kconfig index 0a3e49d6a331..e89c2658e6cf 100644 --- a/security/Kconfig +++ b/security/Kconfig @@ -68,7 +68,7 @@ config PAGE_TABLE_ISOLATION config ADDRESS_SPACE_ISOLATION bool "Allow code to run with a reduced kernel address space" default n - depends on X86_64 && !UML && SLAB + depends on X86_64 && !UML && SLAB && !NEED_PER_CPU_KM depends on !PARAVIRT help This feature provides the ability to run some kernel code --=20 2.35.1.473.g83b2b277ed-goog From nobody Fri Jun 26 13:44:01 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 86266C433EF for ; Wed, 23 Feb 2022 05:25:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238178AbiBWF0D (ORCPT ); Wed, 23 Feb 2022 00:26:03 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56914 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238270AbiBWFZh (ORCPT ); Wed, 23 Feb 2022 00:25:37 -0500 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 76F1B6C1F9 for ; Tue, 22 Feb 2022 21:24:37 -0800 (PST) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-2d5258e8a4bso162221097b3.22 for ; Tue, 22 Feb 2022 21:24:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=orMBAJTu5bZfo+ZjtAIW/XRZfJpT6yrC1YB1ci4mdhQ=; b=i488SXIT1e9g9PD4iGqqQ5k0m2F8EhUxsfxeEhnLAG3wqUbodJtbO+eBa1paYhzivd f6qEUMrlJuQyIim27Unr9cJtD2qOlwrJ3Vb92Q7HTAPrjZK/4FZKPxvkHiWfCGgfeGks wPJV2pOqdvyTeI1kLDQCnafd3hNR4RZrBUmWr4GBRfp9dxcFXFHOnNlY5ByTNaX9Ovn7 medb8pTA3zAjMiK0tjiHROjMfziSZ3ZN7ikLLMgh9Do8+8uTWIEbihQ6o2v0V8CMkelT vayq0rCzDDSabPvpcm7/0+wlNG5uCJLTrXlA3oAlDMciITRnUAJzKr5xLZ2ttczqv0Td DGUA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=orMBAJTu5bZfo+ZjtAIW/XRZfJpT6yrC1YB1ci4mdhQ=; b=RWzfVYnDSqCjFj9r8cdeX6m8pL73ylwp0mr6t3xs169B/XSNZ65RnDcG/Y3uhGrpNq UPR3WnTegmBuTjUqeBYefToXnOjWvlBi2ZbqEH6Uq1zC8/bS6ygAyEGB4H1vc0s40kyD MK966A7Z8p6KbzoUQCOkjX+AHYYZYuAU9JR3iAbWTNGh39/wFwi7OMJw/9LgSWMR7HjR aJopOHOTZMHdw6NI4dXydrE9xZtk6RSmyFMOGWzNOFTig4t6F+ObMKmsLNJLLG/U52ft esJXJYejjdFc5v3raBcQU3/ojfx2x5c13iDnCEyN1rfUaHMIFNJ/A5n2J64NynFlM+HC UC8A== X-Gm-Message-State: AOAM532cJo/bGCPE9d62zrCq/u/xvRq1AdpQH7KtIzl3+qZtEimFuLPG 4COvB3BC2QM0QjsTazLHJ2KHogN0qp532JDdulkA21KRlXRQAWwQ7Y88maKJDXbeyGgaQBeDMJK enXlYHB3Eyeuj+popPGcu4QKG7bIZktsD3lYnD68Ur5HHsNwVE9LzlEzcsz6VDnMtuhu20bI5 X-Google-Smtp-Source: ABdhPJz6tUJjJr3TTCqxc+szuwfGgWLkVUJz81MU2cNZP+P/jcoAt0UrWZKKXpvg67XLb/VTmlwpxyNA5Syq X-Received: from js-desktop.svl.corp.google.com ([2620:15c:2cd:202:ccbe:5d15:e2e6:322]) (user=junaids job=sendgmr) by 2002:a25:a4e8:0:b0:61e:1eb6:19bd with SMTP id g95-20020a25a4e8000000b0061e1eb619bdmr27268416ybi.168.1645593863676; Tue, 22 Feb 2022 21:24:23 -0800 (PST) Date: Tue, 22 Feb 2022 21:21:53 -0800 In-Reply-To: <20220223052223.1202152-1-junaids@google.com> Message-Id: <20220223052223.1202152-18-junaids@google.com> Mime-Version: 1.0 References: <20220223052223.1202152-1-junaids@google.com> X-Mailer: git-send-email 2.35.1.473.g83b2b277ed-goog Subject: [RFC PATCH 17/47] mm: asi: Aliased direct map for local non-sensitive allocations From: Junaid Shahid To: linux-kernel@vger.kernel.org Cc: kvm@vger.kernel.org, pbonzini@redhat.com, jmattson@google.com, pjt@google.com, oweisse@google.com, alexandre.chartre@oracle.com, rppt@linux.ibm.com, dave.hansen@linux.intel.com, peterz@infradead.org, tglx@linutronix.de, luto@kernel.org, linux-mm@kvack.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This creates a second copy of the direct map, which mirrors the normal direct map in the regular unrestricted kernel page tables. But in the ASI restricted address spaces, the page tables for this aliased direct map would be local to each process. So this aliased map can be used for locally non-sensitive page allocations. Because of the lack of available kernel virtual address space, we have to reduce the max possible direct map size by half. That should be fine with 5 level page tables but could be an issue with 4 level page tables (as max 32 TB RAM could be supported instead of 64 TB). An alternative vmap-style implementation of an aliased local region is possible without this limitation, but that has some other compromises and would be usable only if we trim down the types of structures marked as local non-sensitive by limiting the designation to only those that really are locally non-sensitive but globally sensitive. That is certainly ideal and likely feasible, and would also allow removal of some other relatively complex infrastructure introduced in later patches. But we are including this implementation here just for demonstration of a fully general mechanism. An altogether different alternative to a separate aliased region is also possible by just partitioning the regular direct map (either statically or dynamically via additional page-block types), which is certainly feasible but would require more effort to implement properly. Signed-off-by: Junaid Shahid --- arch/x86/include/asm/page.h | 19 +++++++- arch/x86/include/asm/page_64.h | 25 +++++++++- arch/x86/include/asm/page_64_types.h | 20 ++++++++ arch/x86/kernel/e820.c | 7 ++- arch/x86/mm/asi.c | 69 +++++++++++++++++++++++++++- arch/x86/mm/kaslr.c | 34 +++++++++++++- arch/x86/mm/mm_internal.h | 2 + arch/x86/mm/physaddr.c | 8 ++++ include/linux/page-flags.h | 3 ++ include/trace/events/mmflags.h | 3 +- security/Kconfig | 1 + 11 files changed, 183 insertions(+), 8 deletions(-) diff --git a/arch/x86/include/asm/page.h b/arch/x86/include/asm/page.h index 4d5810c8fab7..7688ba9d3542 100644 --- a/arch/x86/include/asm/page.h +++ b/arch/x86/include/asm/page.h @@ -18,6 +18,7 @@ =20 struct page; =20 +#include #include extern struct range pfn_mapped[]; extern int nr_pfn_mapped; @@ -56,8 +57,24 @@ static inline void copy_user_page(void *to, void *from, = unsigned long vaddr, __phys_addr_symbol(__phys_reloc_hide((unsigned long)(x))) =20 #ifndef __va -#define __va(x) ((void *)((unsigned long)(x)+PAGE_OFFSET)) + +#define ___va(x) ((void *)((unsigned long)(x)+PAGE_OFFSET)) + +#ifndef CONFIG_ADDRESS_SPACE_ISOLATION +#define __va(x) ___va(x) +#else + +DECLARE_STATIC_KEY_FALSE(asi_local_map_initialized); +void *asi_va(unsigned long pa); + +/* + * This might significantly increase the size of the jump table. + * If that turns out to be a problem, we should use a non-static branch. + */ +#define __va(x) (static_branch_likely(&asi_local_map_initialized) \ + ? asi_va((unsigned long)(x)) : ___va(x)) #endif +#endif /* __va */ =20 #define __boot_va(x) __va(x) #define __boot_pa(x) __pa(x) diff --git a/arch/x86/include/asm/page_64.h b/arch/x86/include/asm/page_64.h index 4bde0dc66100..2845eca02552 100644 --- a/arch/x86/include/asm/page_64.h +++ b/arch/x86/include/asm/page_64.h @@ -5,6 +5,7 @@ #include =20 #ifndef __ASSEMBLY__ +#include #include =20 /* duplicated to the one in bootmem.h */ @@ -15,12 +16,34 @@ extern unsigned long page_offset_base; extern unsigned long vmalloc_base; extern unsigned long vmemmap_base; =20 +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + +extern unsigned long asi_local_map_base; +DECLARE_STATIC_KEY_FALSE(asi_local_map_initialized); + +#else + +/* Should never be used if ASI is not enabled */ +#define asi_local_map_base (*(ulong *)NULL) + +#endif + static inline unsigned long __phys_addr_nodebug(unsigned long x) { unsigned long y =3D x - __START_KERNEL_map; + unsigned long map_start =3D PAGE_OFFSET; =20 +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + /* + * This might significantly increase the size of the jump table. + * If that turns out to be a problem, we should use a non-static branch. + */ + if (static_branch_likely(&asi_local_map_initialized) && + x > ASI_LOCAL_MAP) + map_start =3D ASI_LOCAL_MAP; +#endif /* use the carry flag to determine if x was < __START_KERNEL_map */ - x =3D y + ((x > y) ? phys_base : (__START_KERNEL_map - PAGE_OFFSET)); + x =3D y + ((x > y) ? phys_base : (__START_KERNEL_map - map_start)); =20 return x; } diff --git a/arch/x86/include/asm/page_64_types.h b/arch/x86/include/asm/pa= ge_64_types.h index e9e2c3ba5923..bd27ebe51a8c 100644 --- a/arch/x86/include/asm/page_64_types.h +++ b/arch/x86/include/asm/page_64_types.h @@ -2,6 +2,8 @@ #ifndef _ASM_X86_PAGE_64_DEFS_H #define _ASM_X86_PAGE_64_DEFS_H =20 +#include + #ifndef __ASSEMBLY__ #include #endif @@ -47,6 +49,24 @@ #define __PAGE_OFFSET __PAGE_OFFSET_BASE_L4 #endif /* CONFIG_DYNAMIC_MEMORY_LAYOUT */ =20 +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + +#define __ASI_LOCAL_MAP_BASE (__PAGE_OFFSET + \ + ALIGN(_BITUL(MAX_PHYSMEM_BITS - 1), PGDIR_SIZE)) + +#ifdef CONFIG_DYNAMIC_MEMORY_LAYOUT +#define ASI_LOCAL_MAP asi_local_map_base +#else +#define ASI_LOCAL_MAP __ASI_LOCAL_MAP_BASE +#endif + +#else /* CONFIG_ADDRESS_SPACE_ISOLATION */ + +/* Should never be used if ASI is not enabled */ +#define ASI_LOCAL_MAP (*(ulong *)NULL) + +#endif + #define __START_KERNEL_map _AC(0xffffffff80000000, UL) =20 /* See Documentation/x86/x86_64/mm.rst for a description of the memory map= . */ diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c index bc0657f0deed..e2ea4d6bfbdf 100644 --- a/arch/x86/kernel/e820.c +++ b/arch/x86/kernel/e820.c @@ -880,6 +880,11 @@ static void __init early_panic(char *msg) =20 static int userdef __initdata; =20 +u64 __init set_phys_mem_limit(u64 size) +{ + return e820__range_remove(size, ULLONG_MAX - size, E820_TYPE_RAM, 1); +} + /* The "mem=3Dnopentium" boot option disables 4MB page tables on 32-bit ke= rnels: */ static int __init parse_memopt(char *p) { @@ -905,7 +910,7 @@ static int __init parse_memopt(char *p) if (mem_size =3D=3D 0) return -EINVAL; =20 - e820__range_remove(mem_size, ULLONG_MAX - mem_size, E820_TYPE_RAM, 1); + set_phys_mem_limit(mem_size); =20 #ifdef CONFIG_MEMORY_HOTPLUG max_mem_size =3D mem_size; diff --git a/arch/x86/mm/asi.c b/arch/x86/mm/asi.c index 58d1c532274a..38eaa650bac1 100644 --- a/arch/x86/mm/asi.c +++ b/arch/x86/mm/asi.c @@ -22,6 +22,12 @@ EXPORT_PER_CPU_SYMBOL_GPL(asi_cpu_state); =20 __aligned(PAGE_SIZE) pgd_t asi_global_nonsensitive_pgd[PTRS_PER_PGD]; =20 +DEFINE_STATIC_KEY_FALSE(asi_local_map_initialized); +EXPORT_SYMBOL(asi_local_map_initialized); + +unsigned long asi_local_map_base __ro_after_init; +EXPORT_SYMBOL(asi_local_map_base); + int asi_register_class(const char *name, uint flags, const struct asi_hooks *ops) { @@ -181,8 +187,44 @@ static void asi_free_pgd(struct asi *asi) =20 static int __init set_asi_param(char *str) { - if (strcmp(str, "on") =3D=3D 0) + if (strcmp(str, "on") =3D=3D 0) { + /* TODO: We should eventually add support for KASAN. */ + if (IS_ENABLED(CONFIG_KASAN)) { + pr_warn("ASI is currently not supported with KASAN"); + return 0; + } + + /* + * We create a second copy of the direct map for the aliased + * ASI Local Map, so we can support only half of the max + * amount of RAM. That should be fine with 5 level page tables + * but could be an issue with 4 level page tables. + * + * An alternative vmap-style implementation of an aliased local + * region is possible without this limitation, but that has + * some other compromises and would be usable only if + * we trim down the types of structures marked as local + * non-sensitive by limiting the designation to only those that + * really are locally non-sensitive but globally sensitive. + * That is certainly ideal and likely feasible, and would also + * allow removal of some other relatively complex infrastructure + * introduced in later patches. But we are including this + * implementation here just for demonstration of a fully general + * mechanism. + * + * An altogether different alternative to a separate aliased + * region is also possible by just partitioning the regular + * direct map (either statically or dynamically via additional + * page-block types), which is certainly feasible but would + * require more effort to implement properly. + */ + if (set_phys_mem_limit(MAXMEM / 2)) + pr_warn("Limiting Memory Size to %llu", MAXMEM / 2); + + asi_local_map_base =3D __ASI_LOCAL_MAP_BASE; + setup_force_cpu_cap(X86_FEATURE_ASI); + } =20 return 0; } @@ -190,6 +232,8 @@ early_param("asi", set_asi_param); =20 static int __init asi_global_init(void) { + uint i, n; + if (!boot_cpu_has(X86_FEATURE_ASI)) return 0; =20 @@ -203,6 +247,14 @@ static int __init asi_global_init(void) VMALLOC_GLOBAL_NONSENSITIVE_END, "ASI Global Non-sensitive vmalloc"); =20 + /* TODO: We should also handle memory hotplug. */ + n =3D DIV_ROUND_UP(PFN_PHYS(max_pfn), PGDIR_SIZE); + for (i =3D 0; i < n; i++) + swapper_pg_dir[pgd_index(ASI_LOCAL_MAP) + i] =3D + swapper_pg_dir[pgd_index(PAGE_OFFSET) + i]; + + static_branch_enable(&asi_local_map_initialized); + return 0; } subsys_initcall(asi_global_init) @@ -236,7 +288,11 @@ int asi_init(struct mm_struct *mm, int asi_index, stru= ct asi **out_asi) if (asi->class->flags & ASI_MAP_STANDARD_NONSENSITIVE) { uint i; =20 - for (i =3D KERNEL_PGD_BOUNDARY; i < PTRS_PER_PGD; i++) + for (i =3D KERNEL_PGD_BOUNDARY; i < pgd_index(ASI_LOCAL_MAP); i++) + set_pgd(asi->pgd + i, asi_global_nonsensitive_pgd[i]); + + for (i =3D pgd_index(VMALLOC_GLOBAL_NONSENSITIVE_START); + i < PTRS_PER_PGD; i++) set_pgd(asi->pgd + i, asi_global_nonsensitive_pgd[i]); } =20 @@ -534,3 +590,12 @@ void asi_flush_tlb_range(struct asi *asi, void *addr, = size_t len) /* Later patches will do a more optimized flush. */ flush_tlb_kernel_range((ulong)addr, (ulong)addr + len); } + +void *asi_va(unsigned long pa) +{ + struct page *page =3D pfn_to_page(PHYS_PFN(pa)); + + return (void *)(pa + (PageLocalNonSensitive(page) + ? ASI_LOCAL_MAP : PAGE_OFFSET)); +} +EXPORT_SYMBOL(asi_va); diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c index 557f0fe25dff..2e68ce84767c 100644 --- a/arch/x86/mm/kaslr.c +++ b/arch/x86/mm/kaslr.c @@ -48,6 +48,7 @@ static const unsigned long vaddr_end =3D CPU_ENTRY_AREA_B= ASE; static __initdata struct kaslr_memory_region { unsigned long *base; unsigned long size_tb; + unsigned long extra_bytes; } kaslr_regions[] =3D { { &page_offset_base, 0 }, { &vmalloc_base, 0 }, @@ -57,7 +58,7 @@ static __initdata struct kaslr_memory_region { /* Get size in bytes used by the memory region */ static inline unsigned long get_padding(struct kaslr_memory_region *region) { - return (region->size_tb << TB_SHIFT); + return (region->size_tb << TB_SHIFT) + region->extra_bytes; } =20 /* Initialize base and padding for each memory region randomized with KASL= R */ @@ -69,6 +70,8 @@ void __init kernel_randomize_memory(void) struct rnd_state rand_state; unsigned long remain_entropy; unsigned long vmemmap_size; + unsigned int max_physmem_bits =3D MAX_PHYSMEM_BITS - + !!boot_cpu_has(X86_FEATURE_ASI); =20 vaddr_start =3D pgtable_l5_enabled() ? __PAGE_OFFSET_BASE_L5 : __PAGE_OFF= SET_BASE_L4; vaddr =3D vaddr_start; @@ -85,7 +88,7 @@ void __init kernel_randomize_memory(void) if (!kaslr_memory_enabled()) return; =20 - kaslr_regions[0].size_tb =3D 1 << (MAX_PHYSMEM_BITS - TB_SHIFT); + kaslr_regions[0].size_tb =3D 1 << (max_physmem_bits - TB_SHIFT); kaslr_regions[1].size_tb =3D VMALLOC_SIZE_TB; =20 /* @@ -100,6 +103,18 @@ void __init kernel_randomize_memory(void) if (memory_tb < kaslr_regions[0].size_tb) kaslr_regions[0].size_tb =3D memory_tb; =20 + if (boot_cpu_has(X86_FEATURE_ASI)) { + ulong direct_map_size =3D kaslr_regions[0].size_tb << TB_SHIFT; + + /* Reserve additional space for the ASI Local Map */ + direct_map_size =3D round_up(direct_map_size, PGDIR_SIZE); + direct_map_size *=3D 2; + VM_BUG_ON(direct_map_size % (1UL << TB_SHIFT)); + + kaslr_regions[0].size_tb =3D direct_map_size >> TB_SHIFT; + kaslr_regions[0].extra_bytes =3D PGDIR_SIZE; + } + /* * Calculate the vmemmap region size in TBs, aligned to a TB * boundary. @@ -136,6 +151,21 @@ void __init kernel_randomize_memory(void) vaddr =3D round_up(vaddr + 1, PUD_SIZE); remain_entropy -=3D entropy; } + + /* + * This ensures that the ASI Local Map does not share a PGD entry with + * the regular direct map, and also that the alignment of the two + * regions is the same. + * + * We are relying on the fact that the region following the ASI Local + * Map will be the local non-sensitive portion of the VMALLOC region. + * If that were not the case and the next region was a global one, + * then we would need extra padding after the ASI Local Map to ensure + * that it doesn't share a PGD entry with that global region. + */ + if (cpu_feature_enabled(X86_FEATURE_ASI)) + asi_local_map_base =3D page_offset_base + PGDIR_SIZE + + ((kaslr_regions[0].size_tb / 2) << TB_SHIFT); } =20 void __meminit init_trampoline_kaslr(void) diff --git a/arch/x86/mm/mm_internal.h b/arch/x86/mm/mm_internal.h index a1e8c523ab08..ace1d0b6d2d9 100644 --- a/arch/x86/mm/mm_internal.h +++ b/arch/x86/mm/mm_internal.h @@ -28,4 +28,6 @@ void update_cache_mode_entry(unsigned entry, enum page_ca= che_mode cache); =20 extern unsigned long tlb_single_page_flush_ceiling; =20 +u64 set_phys_mem_limit(u64 size); + #endif /* __X86_MM_INTERNAL_H */ diff --git a/arch/x86/mm/physaddr.c b/arch/x86/mm/physaddr.c index fc3f3d3e2ef2..2cd6cee942da 100644 --- a/arch/x86/mm/physaddr.c +++ b/arch/x86/mm/physaddr.c @@ -21,6 +21,9 @@ unsigned long __phys_addr(unsigned long x) x =3D y + phys_base; =20 VIRTUAL_BUG_ON(y >=3D KERNEL_IMAGE_SIZE); + } else if (cpu_feature_enabled(X86_FEATURE_ASI) && x > ASI_LOCAL_MAP) { + x -=3D ASI_LOCAL_MAP; + VIRTUAL_BUG_ON(!phys_addr_valid(x)); } else { x =3D y + (__START_KERNEL_map - PAGE_OFFSET); =20 @@ -28,6 +31,7 @@ unsigned long __phys_addr(unsigned long x) VIRTUAL_BUG_ON((x > y) || !phys_addr_valid(x)); } =20 + VIRTUAL_BUG_ON(!pfn_valid(x >> PAGE_SHIFT)); return x; } EXPORT_SYMBOL(__phys_addr); @@ -54,6 +58,10 @@ bool __virt_addr_valid(unsigned long x) =20 if (y >=3D KERNEL_IMAGE_SIZE) return false; + } else if (cpu_feature_enabled(X86_FEATURE_ASI) && x > ASI_LOCAL_MAP) { + x -=3D ASI_LOCAL_MAP; + if (!phys_addr_valid(x)) + return false; } else { x =3D y + (__START_KERNEL_map - PAGE_OFFSET); =20 diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index a07434cc679c..e5223a05c41a 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -143,6 +143,7 @@ enum pageflags { #endif #ifdef CONFIG_ADDRESS_SPACE_ISOLATION PG_global_nonsensitive, + PG_local_nonsensitive, #endif __NR_PAGEFLAGS, =20 @@ -547,8 +548,10 @@ PAGEFLAG(Idle, idle, PF_ANY) =20 #ifdef CONFIG_ADDRESS_SPACE_ISOLATION __PAGEFLAG(GlobalNonSensitive, global_nonsensitive, PF_ANY); +__PAGEFLAG(LocalNonSensitive, local_nonsensitive, PF_ANY); #else __PAGEFLAG_FALSE(GlobalNonSensitive, global_nonsensitive); +__PAGEFLAG_FALSE(LocalNonSensitive, local_nonsensitive); #endif =20 #ifdef CONFIG_KASAN_HW_TAGS diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h index 73a49197ef54..96e61d838bec 100644 --- a/include/trace/events/mmflags.h +++ b/include/trace/events/mmflags.h @@ -129,7 +129,8 @@ IF_HAVE_PG_IDLE(PG_young, "young" ) \ IF_HAVE_PG_IDLE(PG_idle, "idle" ) \ IF_HAVE_PG_ARCH_2(PG_arch_2, "arch_2" ) \ IF_HAVE_PG_SKIP_KASAN_POISON(PG_skip_kasan_poison, "skip_kasan_poison") \ -IF_HAVE_ASI(PG_global_nonsensitive, "global_nonsensitive") +IF_HAVE_ASI(PG_global_nonsensitive, "global_nonsensitive") \ +IF_HAVE_ASI(PG_local_nonsensitive, "local_nonsensitive") =20 #define show_page_flags(flags) \ (flags) ? __print_flags(flags, "|", \ diff --git a/security/Kconfig b/security/Kconfig index e89c2658e6cf..070a948b5266 100644 --- a/security/Kconfig +++ b/security/Kconfig @@ -70,6 +70,7 @@ config ADDRESS_SPACE_ISOLATION default n depends on X86_64 && !UML && SLAB && !NEED_PER_CPU_KM depends on !PARAVIRT + depends on !MEMORY_HOTPLUG help This feature provides the ability to run some kernel code with a reduced kernel address space. This can be used to --=20 2.35.1.473.g83b2b277ed-goog From nobody Fri Jun 26 13:44:01 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 06301C433EF for ; Wed, 23 Feb 2022 05:25:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238274AbiBWF0L (ORCPT ); Wed, 23 Feb 2022 00:26:11 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58108 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238238AbiBWFZi (ORCPT ); Wed, 23 Feb 2022 00:25:38 -0500 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 600506C922 for ; Tue, 22 Feb 2022 21:24:39 -0800 (PST) Received: by mail-yb1-xb4a.google.com with SMTP id z15-20020a25bb0f000000b00613388c7d99so26693168ybg.8 for ; Tue, 22 Feb 2022 21:24:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=1WKa+KZIcaa+dJcFTsqZOSts6oD//SyJ1kbRYfoMX2c=; b=sZHRLO9kxK5TzWD0X0seDuzrMm3IaK4jRNNmTtrhccA0nwFV9CPHeE86cjQT27AmkI v7sdoVjafaJDxmnspYK1IJDte5ixXIP4NlIpzJcvsi6oStzTdw2xXhTTK6hSh2PE8S4p mF1cpkFTM3aKbNKaOvkN2Z8/pjjAmAg7yV5uvedIm/mcA50u4mXY9mOc7gt5zMPS72Jb jLpByILet1L0UsVaxDROFUGR6gYkZbc/Z5Ipq4VnAgrCI+T/jtIuqYuHmNGwLB/tOVe/ jw6HvYSt2X5Np3Q4sH2T+yBRKnSfg6rhn+2SA7NrFtkru9npVZHpssAQN1BJYkVWWTVb OPZg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=1WKa+KZIcaa+dJcFTsqZOSts6oD//SyJ1kbRYfoMX2c=; b=GD1UdBSUHiarbFcurw+iR62yF9pv2ySP6wQlp8xM2xNxfmmwDOg5n5yDwMuunYgDTZ uP/2P7esxP8bNKXwN6jSMh7KDIzSAi4wCx9eXcNOUk6tGRNrWUTCc+fBzP9lNGyG7niT 0Qk+Sr0+ftlr2oGNyvXhfvDf68FeejNwxraD/Ju3UOA5/iuwClvHGROclHPkliQ5T/8k o9JjmCE6jgc80rtVQy7ixwOkRTp1HXPBgmui+F7uBDDqzbDzAAhtmI79pqaxAbzE/cAj gehtXOgD9C32it5ZALXIYUoM22g6Ez2d1xnC937+XZb6ma6ZTg5VxX/bZswQJ0/dJX0y nOOA== X-Gm-Message-State: AOAM5309ZUP0kw/fRzfLV/iI8cleUO4YYh6c5+BJvn6/3+EJ5/SSUbjb Apb5SaUNi9R3bYaM954ov5iCbUJQ81PuXVMbOpwW3fJlpP0DfkQXTPx6XQuEi+1Gs2QLrfvYmK0 I7ZQ+neNQbH4bF6d+TeBh5wb2XFNQGJBVwCOx1K28ZIZANlzYJK4Ks1bc2sukIUc5OBjeDGsF X-Google-Smtp-Source: ABdhPJzkV7QmjkoELH2kQlDpniWdBZ4HcBwkQjVMUj/MiMOJhgvGDpbnFOCshsSuGHfkjD35goEqd0h2ttTQ X-Received: from js-desktop.svl.corp.google.com ([2620:15c:2cd:202:ccbe:5d15:e2e6:322]) (user=junaids job=sendgmr) by 2002:a81:7848:0:b0:2ca:287c:6ce3 with SMTP id t69-20020a817848000000b002ca287c6ce3mr26938064ywc.392.1645593865848; Tue, 22 Feb 2022 21:24:25 -0800 (PST) Date: Tue, 22 Feb 2022 21:21:54 -0800 In-Reply-To: <20220223052223.1202152-1-junaids@google.com> Message-Id: <20220223052223.1202152-19-junaids@google.com> Mime-Version: 1.0 References: <20220223052223.1202152-1-junaids@google.com> X-Mailer: git-send-email 2.35.1.473.g83b2b277ed-goog Subject: [RFC PATCH 18/47] mm: asi: Support for pre-ASI-init local non-sensitive allocations From: Junaid Shahid To: linux-kernel@vger.kernel.org Cc: kvm@vger.kernel.org, pbonzini@redhat.com, jmattson@google.com, pjt@google.com, oweisse@google.com, alexandre.chartre@oracle.com, rppt@linux.ibm.com, dave.hansen@linux.intel.com, peterz@infradead.org, tglx@linutronix.de, luto@kernel.org, linux-mm@kvack.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Local non-sensitive allocations can be made before an actual ASI instance is initialized. To support this, a process-wide pseudo-PGD is created, which contains mappings for all locally non-sensitive allocations. Memory can be mapped into this pseudo-PGD by using ASI_LOCAL_NONSENSITIVE when calling asi_map(). The mappings will be copied to an actual ASI PGD when an ASI instance is initialized in that process, by copying all the PGD entries in the local non-sensitive range from the pseudo-PGD to the ASI PGD. In addition, the page fault handler will copy any new PGD entries that get added after the initialization of the ASI instance. Signed-off-by: Junaid Shahid --- arch/x86/include/asm/asi.h | 6 +++- arch/x86/mm/asi.c | 74 +++++++++++++++++++++++++++++++++++++- arch/x86/mm/fault.c | 7 ++++ include/asm-generic/asi.h | 12 ++++++- kernel/fork.c | 8 +++-- 5 files changed, 102 insertions(+), 5 deletions(-) diff --git a/arch/x86/include/asm/asi.h b/arch/x86/include/asm/asi.h index f69e1f2f09a4..f11010c0334b 100644 --- a/arch/x86/include/asm/asi.h +++ b/arch/x86/include/asm/asi.h @@ -16,6 +16,7 @@ #define ASI_MAX_NUM (1 << ASI_MAX_NUM_ORDER) =20 #define ASI_GLOBAL_NONSENSITIVE (&init_mm.asi[0]) +#define ASI_LOCAL_NONSENSITIVE (¤t->mm->asi[0]) =20 struct asi_state { struct asi *curr_asi; @@ -45,7 +46,8 @@ DECLARE_PER_CPU_ALIGNED(struct asi_state, asi_cpu_state); =20 extern pgd_t asi_global_nonsensitive_pgd[]; =20 -void asi_init_mm_state(struct mm_struct *mm); +int asi_init_mm_state(struct mm_struct *mm); +void asi_free_mm_state(struct mm_struct *mm); =20 int asi_register_class(const char *name, uint flags, const struct asi_hooks *ops); @@ -61,6 +63,8 @@ int asi_map_gfp(struct asi *asi, void *addr, size_t len,= gfp_t gfp_flags); int asi_map(struct asi *asi, void *addr, size_t len); void asi_unmap(struct asi *asi, void *addr, size_t len, bool flush_tlb); void asi_flush_tlb_range(struct asi *asi, void *addr, size_t len); +void asi_sync_mapping(struct asi *asi, void *addr, size_t len); +void asi_do_lazy_map(struct asi *asi, size_t addr); =20 static inline void asi_init_thread_state(struct thread_struct *thread) { diff --git a/arch/x86/mm/asi.c b/arch/x86/mm/asi.c index 38eaa650bac1..3ba0971a318d 100644 --- a/arch/x86/mm/asi.c +++ b/arch/x86/mm/asi.c @@ -73,6 +73,17 @@ void asi_unregister_class(int index) } EXPORT_SYMBOL_GPL(asi_unregister_class); =20 +static void asi_clone_pgd(pgd_t *dst_table, pgd_t *src_table, size_t addr) +{ + pgd_t *src =3D pgd_offset_pgd(src_table, addr); + pgd_t *dst =3D pgd_offset_pgd(dst_table, addr); + + if (!pgd_val(*dst)) + set_pgd(dst, *src); + else + VM_BUG_ON(pgd_val(*dst) !=3D pgd_val(*src)); +} + #ifndef mm_inc_nr_p4ds #define mm_inc_nr_p4ds(mm) do {} while (false) #endif @@ -291,6 +302,11 @@ int asi_init(struct mm_struct *mm, int asi_index, stru= ct asi **out_asi) for (i =3D KERNEL_PGD_BOUNDARY; i < pgd_index(ASI_LOCAL_MAP); i++) set_pgd(asi->pgd + i, asi_global_nonsensitive_pgd[i]); =20 + for (i =3D pgd_index(ASI_LOCAL_MAP); + i <=3D pgd_index(ASI_LOCAL_MAP + PFN_PHYS(max_possible_pfn)); + i++) + set_pgd(asi->pgd + i, mm->asi[0].pgd[i]); + for (i =3D pgd_index(VMALLOC_GLOBAL_NONSENSITIVE_START); i < PTRS_PER_PGD; i++) set_pgd(asi->pgd + i, asi_global_nonsensitive_pgd[i]); @@ -379,7 +395,7 @@ void asi_exit(void) } EXPORT_SYMBOL_GPL(asi_exit); =20 -void asi_init_mm_state(struct mm_struct *mm) +int asi_init_mm_state(struct mm_struct *mm) { struct mem_cgroup *memcg =3D get_mem_cgroup_from_mm(mm); =20 @@ -395,6 +411,28 @@ void asi_init_mm_state(struct mm_struct *mm) memcg->use_asi; css_put(&memcg->css); } + + if (!mm->asi_enabled) + return 0; + + mm->asi[0].mm =3D mm; + mm->asi[0].pgd =3D (pgd_t *)__get_free_page(GFP_PGTABLE_USER); + if (!mm->asi[0].pgd) + return -ENOMEM; + + return 0; +} + +void asi_free_mm_state(struct mm_struct *mm) +{ + if (!boot_cpu_has(X86_FEATURE_ASI) || !mm->asi_enabled) + return; + + asi_free_pgd_range(&mm->asi[0], pgd_index(ASI_LOCAL_MAP), + pgd_index(ASI_LOCAL_MAP + + PFN_PHYS(max_possible_pfn)) + 1); + + free_page((ulong)mm->asi[0].pgd); } =20 static bool is_page_within_range(size_t addr, size_t page_size, @@ -599,3 +637,37 @@ void *asi_va(unsigned long pa) ? ASI_LOCAL_MAP : PAGE_OFFSET)); } EXPORT_SYMBOL(asi_va); + +static bool is_addr_in_local_nonsensitive_range(size_t addr) +{ + return addr >=3D ASI_LOCAL_MAP && + addr < VMALLOC_GLOBAL_NONSENSITIVE_START; +} + +void asi_do_lazy_map(struct asi *asi, size_t addr) +{ + if (!static_cpu_has(X86_FEATURE_ASI) || !asi) + return; + + if ((asi->class->flags & ASI_MAP_STANDARD_NONSENSITIVE) && + is_addr_in_local_nonsensitive_range(addr)) + asi_clone_pgd(asi->pgd, asi->mm->asi[0].pgd, addr); +} + +/* + * Should be called after asi_map(ASI_LOCAL_NONSENSITIVE,...) for any mapp= ing + * that is required to exist prior to asi_enter() (e.g. thread stacks) + */ +void asi_sync_mapping(struct asi *asi, void *start, size_t len) +{ + size_t addr =3D (size_t)start; + size_t end =3D addr + len; + + if (!static_cpu_has(X86_FEATURE_ASI) || !asi) + return; + + if ((asi->class->flags & ASI_MAP_STANDARD_NONSENSITIVE) && + is_addr_in_local_nonsensitive_range(addr)) + for (; addr < end; addr =3D pgd_addr_end(addr, end)) + asi_clone_pgd(asi->pgd, asi->mm->asi[0].pgd, addr); +} diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index 4bfed53e210e..8692eb50f4a5 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -1498,6 +1498,12 @@ DEFINE_IDTENTRY_RAW_ERRORCODE(exc_page_fault) { unsigned long address =3D read_cr2(); irqentry_state_t state; + /* + * There is a very small chance that an NMI could cause an asi_exit() + * before this asi_get_current(), but that is ok, we will just do + * the fixup on the next page fault. + */ + struct asi *asi =3D asi_get_current(); =20 prefetchw(¤t->mm->mmap_lock); =20 @@ -1539,6 +1545,7 @@ DEFINE_IDTENTRY_RAW_ERRORCODE(exc_page_fault) =20 instrumentation_begin(); handle_page_fault(regs, error_code, address); + asi_do_lazy_map(asi, address); instrumentation_end(); =20 irqentry_exit(regs, state); diff --git a/include/asm-generic/asi.h b/include/asm-generic/asi.h index 51c9c4a488e8..a1c8ebff70e8 100644 --- a/include/asm-generic/asi.h +++ b/include/asm-generic/asi.h @@ -13,6 +13,7 @@ #define ASI_MAX_NUM 0 =20 #define ASI_GLOBAL_NONSENSITIVE NULL +#define ASI_LOCAL_NONSENSITIVE NULL =20 #define VMALLOC_GLOBAL_NONSENSITIVE_START VMALLOC_START #define VMALLOC_GLOBAL_NONSENSITIVE_END VMALLOC_END @@ -31,7 +32,9 @@ int asi_register_class(const char *name, uint flags, =20 static inline void asi_unregister_class(int asi_index) { } =20 -static inline void asi_init_mm_state(struct mm_struct *mm) { } +static inline int asi_init_mm_state(struct mm_struct *mm) { return 0; } + +static inline void asi_free_mm_state(struct mm_struct *mm) { } =20 static inline int asi_init(struct mm_struct *mm, int asi_index, struct asi **out_asi) @@ -67,9 +70,16 @@ static inline int asi_map(struct asi *asi, void *addr, s= ize_t len) return 0; } =20 +static inline +void asi_sync_mapping(struct asi *asi, void *addr, size_t len) { } + static inline void asi_unmap(struct asi *asi, void *addr, size_t len, bool flush_tlb) { } =20 + +static inline +void asi_do_lazy_map(struct asi *asi, size_t addr) { } + static inline void asi_flush_tlb_range(struct asi *asi, void *addr, size_t len) { } =20 diff --git a/kernel/fork.c b/kernel/fork.c index 3695a32ee9bd..dd5a86e913ea 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -699,6 +699,7 @@ void __mmdrop(struct mm_struct *mm) mm_free_pgd(mm); destroy_context(mm); mmu_notifier_subscriptions_destroy(mm); + asi_free_mm_state(mm); check_mm(mm); put_user_ns(mm->user_ns); free_mm(mm); @@ -1072,17 +1073,20 @@ static struct mm_struct *mm_init(struct mm_struct *= mm, struct task_struct *p, mm->def_flags =3D 0; } =20 - asi_init_mm_state(mm); - if (mm_alloc_pgd(mm)) goto fail_nopgd; =20 if (init_new_context(p, mm)) goto fail_nocontext; =20 + if (asi_init_mm_state(mm)) + goto fail_noasi; + mm->user_ns =3D get_user_ns(user_ns); + return mm; =20 +fail_noasi: fail_nocontext: mm_free_pgd(mm); fail_nopgd: --=20 2.35.1.473.g83b2b277ed-goog From nobody Fri Jun 26 13:44:01 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 85223C433EF for ; Wed, 23 Feb 2022 05:25:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238281AbiBWF0O (ORCPT ); Wed, 23 Feb 2022 00:26:14 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58744 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238284AbiBWFZj (ORCPT ); Wed, 23 Feb 2022 00:25:39 -0500 Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com [IPv6:2607:f8b0:4864:20::114a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DF49D6C93B for ; Tue, 22 Feb 2022 21:24:39 -0800 (PST) Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-2d6994a6942so160468337b3.17 for ; Tue, 22 Feb 2022 21:24:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=0QeFswLcX9iEWT+ZD7rOizRcb8mHIR1VarnP+zim9mU=; b=AoOb1g/bwih9VgHUpDsuRj472kE7mvs4Hd5rKJ629FXGqay4OsONzwnNGCn4KSszBQ 7tR0Ehd/9kdVBMulmZqqn0EKhuRWwo5FmNiVUwzvcobRe45iRRTjqmq29bq6BO+aH2aK iVFm5Q63DJUj0wcMPLNxFHfvSAy/DcNXa1BAm45xpqSab7/o/iDvyz2xnot3Gux2i/bz ep5/x08uO4oi4Oa/VRppWnEJLR30fLBFaY+9aHNaO68XzDzENmX6oojUDxShaZpJxCzn AzyiCxL/E+FUObYZMNwPUDfM+bxPv2Rn6QD88aXQUJCPLfukjGOAKhVZbWGz7HVG4WwU EzCA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=0QeFswLcX9iEWT+ZD7rOizRcb8mHIR1VarnP+zim9mU=; b=E7Im9oecyxT2Lj8yBrnCmau/3RknD5OpPN5YuF6GjpprowmEn+am/Ss0xZcftJialT SPJedVaDOoG+a5lxdVc3sADPzFqmlDrMlu+zAjsXtEJLUdmTXEqALk3C/3FoIJcXlerz SMHasAVXc2egrq+lAfocCOdAXnyIb9C6g7bjMtnrVOrwfBiycUkMM+nUUAdMvuQ5grxm G1Ua1cpJkrNB7QeZ19O7EbPpQkdkYrProJo+1hDRNLNCkFtq+oS2ePsjSsd22MJrGuWH 2HkMWlgT0hAsdWz5wVWKHkofJiXzXfB/GHcEG6pTZiekJNHgnrTaZG0fy+Y1qkGJ8gDJ mqmQ== X-Gm-Message-State: AOAM533UqvkognIajUVYZMbwh4lbZBv0hy/bey3lkOrVVdzpObrFiLly 6LhsOPhDiQ+7/7QJtUZVXxW/a47bSPIZQ1Gm/T7BvKXPc88n7WIbqHyb2DNG1mb2pA52S/vOyBv uzkhFov/u9BoLsKJzycNItBFLWr0PPTINxknhni0WqmkKhgp74XXesJl0tksEoEW9BSDv3hD3 X-Google-Smtp-Source: ABdhPJx4QwVPvcY0cnBZCAO3p1suuFvqTv6OhknMLXEOsTBctmZQz9YZ5chFPkEZtJ5h60746WaCODirXSsF X-Received: from js-desktop.svl.corp.google.com ([2620:15c:2cd:202:ccbe:5d15:e2e6:322]) (user=junaids job=sendgmr) by 2002:a25:6fc1:0:b0:624:43a0:c16c with SMTP id k184-20020a256fc1000000b0062443a0c16cmr21681170ybc.222.1645593868088; Tue, 22 Feb 2022 21:24:28 -0800 (PST) Date: Tue, 22 Feb 2022 21:21:55 -0800 In-Reply-To: <20220223052223.1202152-1-junaids@google.com> Message-Id: <20220223052223.1202152-20-junaids@google.com> Mime-Version: 1.0 References: <20220223052223.1202152-1-junaids@google.com> X-Mailer: git-send-email 2.35.1.473.g83b2b277ed-goog Subject: [RFC PATCH 19/47] mm: asi: Support for locally nonsensitive page allocations From: Junaid Shahid To: linux-kernel@vger.kernel.org Cc: kvm@vger.kernel.org, pbonzini@redhat.com, jmattson@google.com, pjt@google.com, oweisse@google.com, alexandre.chartre@oracle.com, rppt@linux.ibm.com, dave.hansen@linux.intel.com, peterz@infradead.org, tglx@linutronix.de, luto@kernel.org, linux-mm@kvack.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" A new GFP flag, __GFP_LOCAL_NONSENSITIVE, is added to allocate pages that are considered non-sensitive within the context of the current process, but sensitive in the context of other processes. For these allocations, page->asi_mm is set to the current mm during allocation. It must be set to the same value when the page is freed. Though it can potentially be overwritten and used for some other purpose in the meantime, as long as it is restored before freeing. Signed-off-by: Junaid Shahid --- include/linux/gfp.h | 5 +++- include/linux/mm_types.h | 17 ++++++++++-- include/trace/events/mmflags.h | 1 + mm/page_alloc.c | 47 ++++++++++++++++++++++++++++------ tools/perf/builtin-kmem.c | 1 + 5 files changed, 60 insertions(+), 11 deletions(-) diff --git a/include/linux/gfp.h b/include/linux/gfp.h index 07a99a463a34..2ab394adbda3 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -62,8 +62,10 @@ struct vm_area_struct; #endif #ifdef CONFIG_ADDRESS_SPACE_ISOLATION #define ___GFP_GLOBAL_NONSENSITIVE 0x4000000u +#define ___GFP_LOCAL_NONSENSITIVE 0x8000000u #else #define ___GFP_GLOBAL_NONSENSITIVE 0 +#define ___GFP_LOCAL_NONSENSITIVE 0 #endif /* If the above are modified, __GFP_BITS_SHIFT may need updating */ =20 @@ -255,9 +257,10 @@ struct vm_area_struct; =20 /* Allocate non-sensitive memory */ #define __GFP_GLOBAL_NONSENSITIVE ((__force gfp_t)___GFP_GLOBAL_NONSENSITI= VE) +#define __GFP_LOCAL_NONSENSITIVE ((__force gfp_t)___GFP_LOCAL_NONSENSITIVE) =20 /* Room for N __GFP_FOO bits */ -#define __GFP_BITS_SHIFT 27 +#define __GFP_BITS_SHIFT 28 #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1)) =20 /** diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 8624d2783661..f9702d070975 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -193,8 +193,21 @@ struct page { struct rcu_head rcu_head; =20 #ifdef CONFIG_ADDRESS_SPACE_ISOLATION - /* Links the pages_to_free_async list */ - struct llist_node async_free_node; + struct { + /* Links the pages_to_free_async list */ + struct llist_node async_free_node; + + unsigned long _asi_pad_1; + unsigned long _asi_pad_2; + + /* + * Upon allocation of a locally non-sensitive page, set + * to the allocating mm. Must be set to the same mm when + * the page is freed. May potentially be overwritten in + * the meantime, as long as it is restored before free. + */ + struct mm_struct *asi_mm; + }; #endif }; =20 diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h index 96e61d838bec..c00b8a4e1968 100644 --- a/include/trace/events/mmflags.h +++ b/include/trace/events/mmflags.h @@ -51,6 +51,7 @@ {(unsigned long)__GFP_KSWAPD_RECLAIM, "__GFP_KSWAPD_RECLAIM"},\ {(unsigned long)__GFP_ZEROTAGS, "__GFP_ZEROTAGS"}, \ {(unsigned long)__GFP_SKIP_KASAN_POISON,"__GFP_SKIP_KASAN_POISON"},\ + {(unsigned long)__GFP_LOCAL_NONSENSITIVE, "__GFP_LOCAL_NONSENSITIVE"},\ {(unsigned long)__GFP_GLOBAL_NONSENSITIVE, "__GFP_GLOBAL_NONSENSITIVE"}\ =20 #define show_gfp_flags(flags) \ diff --git a/mm/page_alloc.c b/mm/page_alloc.c index a4048fa1868a..01784bff2a80 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5231,19 +5231,33 @@ early_initcall(asi_page_alloc_init); static int asi_map_alloced_pages(struct page *page, uint order, gfp_t gfp_= mask) { uint i; + struct asi *asi; + + VM_BUG_ON((gfp_mask & (__GFP_GLOBAL_NONSENSITIVE | + __GFP_LOCAL_NONSENSITIVE)) =3D=3D + (__GFP_GLOBAL_NONSENSITIVE | __GFP_LOCAL_NONSENSITIVE)); =20 if (!static_asi_enabled()) return 0; =20 + if (!(gfp_mask & (__GFP_GLOBAL_NONSENSITIVE | + __GFP_LOCAL_NONSENSITIVE))) + return 0; + if (gfp_mask & __GFP_GLOBAL_NONSENSITIVE) { + asi =3D ASI_GLOBAL_NONSENSITIVE; for (i =3D 0; i < (1 << order); i++) __SetPageGlobalNonSensitive(page + i); - - return asi_map_gfp(ASI_GLOBAL_NONSENSITIVE, page_to_virt(page), - PAGE_SIZE * (1 << order), gfp_mask); + } else { + asi =3D ASI_LOCAL_NONSENSITIVE; + for (i =3D 0; i < (1 << order); i++) { + __SetPageLocalNonSensitive(page + i); + page[i].asi_mm =3D current->mm; + } } =20 - return 0; + return asi_map_gfp(asi, page_to_virt(page), + PAGE_SIZE * (1 << order), gfp_mask); } =20 static bool asi_unmap_freed_pages(struct page *page, unsigned int order) @@ -5251,18 +5265,28 @@ static bool asi_unmap_freed_pages(struct page *page= , unsigned int order) void *va; size_t len; bool async_flush_needed; + struct asi *asi; + + VM_BUG_ON(PageGlobalNonSensitive(page) && PageLocalNonSensitive(page)); =20 if (!static_asi_enabled()) return true; =20 - if (!PageGlobalNonSensitive(page)) + if (PageGlobalNonSensitive(page)) + asi =3D ASI_GLOBAL_NONSENSITIVE; + else if (PageLocalNonSensitive(page)) + asi =3D &page->asi_mm->asi[0]; + else return true; =20 + /* Heuristic to check that page->asi_mm is actually an mm_struct */ + VM_BUG_ON(PageLocalNonSensitive(page) && asi->mm !=3D page->asi_mm); + va =3D page_to_virt(page); len =3D PAGE_SIZE * (1 << order); async_flush_needed =3D irqs_disabled() || in_interrupt(); =20 - asi_unmap(ASI_GLOBAL_NONSENSITIVE, va, len, !async_flush_needed); + asi_unmap(asi, va, len, !async_flush_needed); =20 if (!async_flush_needed) return true; @@ -5476,8 +5500,15 @@ struct page *__alloc_pages(gfp_t gfp, unsigned int o= rder, int preferred_nid, return NULL; } =20 - if (static_asi_enabled() && (gfp & __GFP_GLOBAL_NONSENSITIVE)) - gfp |=3D __GFP_ZERO; + if (static_asi_enabled()) { + if ((gfp & __GFP_LOCAL_NONSENSITIVE) && + !mm_asi_enabled(current->mm)) + gfp &=3D ~__GFP_LOCAL_NONSENSITIVE; + + if (gfp & (__GFP_GLOBAL_NONSENSITIVE | + __GFP_LOCAL_NONSENSITIVE)) + gfp |=3D __GFP_ZERO; + } =20 gfp &=3D gfp_allowed_mask; /* diff --git a/tools/perf/builtin-kmem.c b/tools/perf/builtin-kmem.c index 5857953cd5c1..a2337fc3404f 100644 --- a/tools/perf/builtin-kmem.c +++ b/tools/perf/builtin-kmem.c @@ -661,6 +661,7 @@ static const struct { { "__GFP_DIRECT_RECLAIM", "DR" }, { "__GFP_KSWAPD_RECLAIM", "KR" }, { "__GFP_GLOBAL_NONSENSITIVE", "GNS" }, + { "__GFP_LOCAL_NONSENSITIVE", "LNS" }, }; =20 static size_t max_gfp_len; --=20 2.35.1.473.g83b2b277ed-goog From nobody Fri Jun 26 13:44:01 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2A1DCC433EF for ; Wed, 23 Feb 2022 05:26:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238324AbiBWF0Z (ORCPT ); Wed, 23 Feb 2022 00:26:25 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58756 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238302AbiBWFZl (ORCPT ); Wed, 23 Feb 2022 00:25:41 -0500 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A7F0B6C976 for ; Tue, 22 Feb 2022 21:24:42 -0800 (PST) Received: by mail-yb1-xb49.google.com with SMTP id s11-20020a255e0b000000b0062277953037so26417513ybb.21 for ; Tue, 22 Feb 2022 21:24:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=KctzOijb2lPzQRsVweA+t5lqX+AMon2xx7uJ8sEe3V0=; b=o6x+fXjXHT+YmW/ZEagw0M3GAQliRWz6GkGaqA4u+vbUpzhuqjZg6dpP5meVpt0ROf kKIgOplxmZwcy8fpFEZTWPlgRlLYr6BEMXfWuy6EkVszZL1eiXWI04tPu/Bave36tyh0 Pg/rLhImfsoHjz3O2aw8506JPbQ7x/tOlKAxNgPJdEMnSjZ93q2iD4DVERcopQg4JNYp GNKE8XJSp9XzGQ0dIwkM6cgZOM8UkwzVOfAn0JxbE5VvBEVsLLUtWn7Lc0a3H+tJ18Mo xFRsJWo21wVW8cCrGTePkWMS8ReiZ3Pptclz1bJcEt4nXmu0gRlgbulAgKKuDIl4DXV3 55IA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=KctzOijb2lPzQRsVweA+t5lqX+AMon2xx7uJ8sEe3V0=; b=DXmmMXw5jOJCPoCEB9rxBqwu3kp9j6xDyFJ0jRZ8IOtYh4U3mikfGLXUDnmYZuqYLq 2n5qXSegLN/QwpPFmDJFUQP/EZwwooouSXtNdq9iCGPDfbyhhRGZmwS7lmxktLSW8w4I 2CAevgZ5GiFcDRHnuTCjxIluBFN2VX5oFaNS1Al7UqGtuRaI9yxUHrd5rAu2/3HBiaI8 VeyMXu7SUrM6ZEMCZk95Ekj7uoo4/bL0XAJ2LQcsnRrjfGJtIav6y/oNy/Xy3X1RTSN0 is+OJBo7Gio0po9Nu2gKcduqEMVV3cw2k+6zmBcKOdpY3DufMjwwuTt0xlPXmBoCRT+P XtCw== X-Gm-Message-State: AOAM532oANRE+/5WrA2wqMOY+oDtH2k1OABeTfHtGjDVpcSOKZwEy88l kfdxQ89JoTP9WdAseObTIMLRgc2SFBxkBusr1ulO/kTgvhYkdSTFccO9qUipA1QzdcJOIBZyB5/ uqxcGodfGVwtzNPJrl7oqQWiNb7uVwM9aNpRBzYhrG5KWzNnDCK/ul8wbcRG8kFsGRxTqtZi3 X-Google-Smtp-Source: ABdhPJz4jScT7T8fgZ/6SCKV/MEmg4Vf6MpNhLflLFYdfY4h7Ie3CFO275rWWB7jrjHpkdQgBhPjBxZoyl00 X-Received: from js-desktop.svl.corp.google.com ([2620:15c:2cd:202:ccbe:5d15:e2e6:322]) (user=junaids job=sendgmr) by 2002:a25:d90b:0:b0:61d:e8c7:82ff with SMTP id q11-20020a25d90b000000b0061de8c782ffmr26287345ybg.171.1645593870304; Tue, 22 Feb 2022 21:24:30 -0800 (PST) Date: Tue, 22 Feb 2022 21:21:56 -0800 In-Reply-To: <20220223052223.1202152-1-junaids@google.com> Message-Id: <20220223052223.1202152-21-junaids@google.com> Mime-Version: 1.0 References: <20220223052223.1202152-1-junaids@google.com> X-Mailer: git-send-email 2.35.1.473.g83b2b277ed-goog Subject: [RFC PATCH 20/47] mm: asi: Support for locally non-sensitive vmalloc allocations From: Junaid Shahid To: linux-kernel@vger.kernel.org Cc: kvm@vger.kernel.org, pbonzini@redhat.com, jmattson@google.com, pjt@google.com, oweisse@google.com, alexandre.chartre@oracle.com, rppt@linux.ibm.com, dave.hansen@linux.intel.com, peterz@infradead.org, tglx@linutronix.de, luto@kernel.org, linux-mm@kvack.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" A new flag, VM_LOCAL_NONSENSITIVE is added to designate locally non-sensitive vmalloc/vmap areas. When using the __vmalloc / __vmalloc_node APIs, if the corresponding GFP flag is specified, the VM flag is automatically added. When using the __vmalloc_node_range API, either flag can be specified independently. The VM flag will only map the vmalloc area as non-sensitive, while the GFP flag will only map the underlying direct map area as non-sensitive. When using the __vmalloc_node_range API, instead of VMALLOC_START/END, VMALLOC_LOCAL_NONSENSITIVE_START/END should be used. This is the range that will have different ASI page tables for each process, thereby providing the local mapping. A command line parameter vmalloc_local_nonsensitive_percent is added to specify the approximate division between the per-process and global vmalloc ranges. Note that regular/sensitive vmalloc/vmap allocations are not restricted by this division and can go anywhere in the entire vmalloc range. The division only applies to non-sensitive allocations. Since no attempt is made to balance regular/sensitive allocations across the division, it is possible that one of these ranges gets filled up by regular allocations, leaving no room for the non-sensitive allocations for which that range was designated. But since the vmalloc range is fairly large, so hopefully that will not be a problem in practice. If that assumption turns out to be incorrect, we could implement a more sophisticated scheme. Signed-off-by: Junaid Shahid --- arch/x86/include/asm/asi.h | 2 + arch/x86/include/asm/page_64.h | 2 + arch/x86/include/asm/pgtable_64_types.h | 7 ++- arch/x86/mm/asi.c | 57 ++++++++++++++++++ include/asm-generic/asi.h | 5 ++ include/linux/vmalloc.h | 6 ++ mm/vmalloc.c | 78 ++++++++++++++++++++----- 7 files changed, 142 insertions(+), 15 deletions(-) diff --git a/arch/x86/include/asm/asi.h b/arch/x86/include/asm/asi.h index f11010c0334b..e3cbf6d8801e 100644 --- a/arch/x86/include/asm/asi.h +++ b/arch/x86/include/asm/asi.h @@ -46,6 +46,8 @@ DECLARE_PER_CPU_ALIGNED(struct asi_state, asi_cpu_state); =20 extern pgd_t asi_global_nonsensitive_pgd[]; =20 +void asi_vmalloc_init(void); + int asi_init_mm_state(struct mm_struct *mm); void asi_free_mm_state(struct mm_struct *mm); =20 diff --git a/arch/x86/include/asm/page_64.h b/arch/x86/include/asm/page_64.h index 2845eca02552..b17574349572 100644 --- a/arch/x86/include/asm/page_64.h +++ b/arch/x86/include/asm/page_64.h @@ -18,6 +18,8 @@ extern unsigned long vmemmap_base; =20 #ifdef CONFIG_ADDRESS_SPACE_ISOLATION =20 +extern unsigned long vmalloc_global_nonsensitive_start; +extern unsigned long vmalloc_local_nonsensitive_end; extern unsigned long asi_local_map_base; DECLARE_STATIC_KEY_FALSE(asi_local_map_initialized); =20 diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm= /pgtable_64_types.h index 0fc380ba25b8..06793f7ef1aa 100644 --- a/arch/x86/include/asm/pgtable_64_types.h +++ b/arch/x86/include/asm/pgtable_64_types.h @@ -142,8 +142,13 @@ extern unsigned int ptrs_per_p4d; #define VMALLOC_END (VMALLOC_START + (VMALLOC_SIZE_TB << 40) - 1) =20 #ifdef CONFIG_ADDRESS_SPACE_ISOLATION -#define VMALLOC_GLOBAL_NONSENSITIVE_START VMALLOC_START + +#define VMALLOC_LOCAL_NONSENSITIVE_START VMALLOC_START +#define VMALLOC_LOCAL_NONSENSITIVE_END vmalloc_local_nonsensitive_end + +#define VMALLOC_GLOBAL_NONSENSITIVE_START vmalloc_global_nonsensitive_start #define VMALLOC_GLOBAL_NONSENSITIVE_END VMALLOC_END + #endif =20 #define MODULES_VADDR (__START_KERNEL_map + KERNEL_IMAGE_SIZE) diff --git a/arch/x86/mm/asi.c b/arch/x86/mm/asi.c index 3ba0971a318d..91e5ff1224ff 100644 --- a/arch/x86/mm/asi.c +++ b/arch/x86/mm/asi.c @@ -3,6 +3,7 @@ #include #include #include +#include =20 #include #include @@ -28,6 +29,17 @@ EXPORT_SYMBOL(asi_local_map_initialized); unsigned long asi_local_map_base __ro_after_init; EXPORT_SYMBOL(asi_local_map_base); =20 +unsigned long vmalloc_global_nonsensitive_start __ro_after_init; +EXPORT_SYMBOL(vmalloc_global_nonsensitive_start); + +unsigned long vmalloc_local_nonsensitive_end __ro_after_init; +EXPORT_SYMBOL(vmalloc_local_nonsensitive_end); + +/* Approximate percent only. Rounded to PGDIR_SIZE boundary. */ +static uint vmalloc_local_nonsensitive_percent __ro_after_init =3D 50; +core_param(vmalloc_local_nonsensitive_percent, + vmalloc_local_nonsensitive_percent, uint, 0444); + int asi_register_class(const char *name, uint flags, const struct asi_hooks *ops) { @@ -307,6 +319,10 @@ int asi_init(struct mm_struct *mm, int asi_index, stru= ct asi **out_asi) i++) set_pgd(asi->pgd + i, mm->asi[0].pgd[i]); =20 + for (i =3D pgd_index(VMALLOC_LOCAL_NONSENSITIVE_START); + i <=3D pgd_index(VMALLOC_LOCAL_NONSENSITIVE_END); i++) + set_pgd(asi->pgd + i, mm->asi[0].pgd[i]); + for (i =3D pgd_index(VMALLOC_GLOBAL_NONSENSITIVE_START); i < PTRS_PER_PGD; i++) set_pgd(asi->pgd + i, asi_global_nonsensitive_pgd[i]); @@ -432,6 +448,10 @@ void asi_free_mm_state(struct mm_struct *mm) pgd_index(ASI_LOCAL_MAP + PFN_PHYS(max_possible_pfn)) + 1); =20 + asi_free_pgd_range(&mm->asi[0], + pgd_index(VMALLOC_LOCAL_NONSENSITIVE_START), + pgd_index(VMALLOC_LOCAL_NONSENSITIVE_END) + 1); + free_page((ulong)mm->asi[0].pgd); } =20 @@ -671,3 +691,40 @@ void asi_sync_mapping(struct asi *asi, void *start, si= ze_t len) for (; addr < end; addr =3D pgd_addr_end(addr, end)) asi_clone_pgd(asi->pgd, asi->mm->asi[0].pgd, addr); } + +void __init asi_vmalloc_init(void) +{ + uint start_index =3D pgd_index(VMALLOC_START); + uint end_index =3D pgd_index(VMALLOC_END); + uint global_start_index; + + if (!boot_cpu_has(X86_FEATURE_ASI)) { + vmalloc_global_nonsensitive_start =3D VMALLOC_START; + vmalloc_local_nonsensitive_end =3D VMALLOC_END; + return; + } + + if (vmalloc_local_nonsensitive_percent =3D=3D 0) { + vmalloc_local_nonsensitive_percent =3D 1; + pr_warn("vmalloc_local_nonsensitive_percent must be non-zero"); + } + + if (vmalloc_local_nonsensitive_percent >=3D 100) { + vmalloc_local_nonsensitive_percent =3D 99; + pr_warn("vmalloc_local_nonsensitive_percent must be less than 100"); + } + + global_start_index =3D start_index + (end_index - start_index) * + vmalloc_local_nonsensitive_percent / 100; + global_start_index =3D max(global_start_index, start_index + 1); + + vmalloc_global_nonsensitive_start =3D -(PTRS_PER_PGD - global_start_index) + * PGDIR_SIZE; + vmalloc_local_nonsensitive_end =3D vmalloc_global_nonsensitive_start - 1; + + pr_debug("vmalloc_global_nonsensitive_start =3D %llx", + vmalloc_global_nonsensitive_start); + + VM_BUG_ON(vmalloc_local_nonsensitive_end >=3D VMALLOC_END); + VM_BUG_ON(vmalloc_global_nonsensitive_start <=3D VMALLOC_START); +} diff --git a/include/asm-generic/asi.h b/include/asm-generic/asi.h index a1c8ebff70e8..7c50d8b64fa4 100644 --- a/include/asm-generic/asi.h +++ b/include/asm-generic/asi.h @@ -18,6 +18,9 @@ #define VMALLOC_GLOBAL_NONSENSITIVE_START VMALLOC_START #define VMALLOC_GLOBAL_NONSENSITIVE_END VMALLOC_END =20 +#define VMALLOC_LOCAL_NONSENSITIVE_START VMALLOC_START +#define VMALLOC_LOCAL_NONSENSITIVE_END VMALLOC_END + #ifndef _ASSEMBLY_ =20 struct asi_hooks {}; @@ -36,6 +39,8 @@ static inline int asi_init_mm_state(struct mm_struct *mm)= { return 0; } =20 static inline void asi_free_mm_state(struct mm_struct *mm) { } =20 +static inline void asi_vmalloc_init(void) { } + static inline int asi_init(struct mm_struct *mm, int asi_index, struct asi **out_asi) { diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h index 5f85690f27b6..2b4eafc21fa5 100644 --- a/include/linux/vmalloc.h +++ b/include/linux/vmalloc.h @@ -41,8 +41,10 @@ struct notifier_block; /* in notifier.h */ =20 #ifdef CONFIG_ADDRESS_SPACE_ISOLATION #define VM_GLOBAL_NONSENSITIVE 0x00000800 /* Similar to __GFP_GLOBAL_NONSE= NSITIVE */ +#define VM_LOCAL_NONSENSITIVE 0x00001000 /* Similar to __GFP_LOCAL_NONSENS= ITIVE */ #else #define VM_GLOBAL_NONSENSITIVE 0 +#define VM_LOCAL_NONSENSITIVE 0 #endif =20 /* bits [20..32] reserved for arch specific ioremap internals */ @@ -67,6 +69,10 @@ struct vm_struct { unsigned int nr_pages; phys_addr_t phys_addr; const void *caller; +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + /* Valid if flags contain VM_*_NONSENSITIVE */ + struct asi *asi; +#endif }; =20 struct vmap_area { diff --git a/mm/vmalloc.c b/mm/vmalloc.c index f13bfe7e896b..ea94d8a1e2e9 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -2391,18 +2391,25 @@ void __init vmalloc_init(void) */ vmap_init_free_space(); vmap_initialized =3D true; + + asi_vmalloc_init(); } =20 +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + static int asi_map_vm_area(struct vm_struct *area) { if (!static_asi_enabled()) return 0; =20 if (area->flags & VM_GLOBAL_NONSENSITIVE) - return asi_map(ASI_GLOBAL_NONSENSITIVE, area->addr, - get_vm_area_size(area)); + area->asi =3D ASI_GLOBAL_NONSENSITIVE; + else if (area->flags & VM_LOCAL_NONSENSITIVE) + area->asi =3D ASI_LOCAL_NONSENSITIVE; + else + return 0; =20 - return 0; + return asi_map(area->asi, area->addr, get_vm_area_size(area)); } =20 static void asi_unmap_vm_area(struct vm_struct *area) @@ -2415,11 +2422,17 @@ static void asi_unmap_vm_area(struct vm_struct *are= a) * the case when the existing flush from try_purge_vmap_area_lazy() * and/or vm_unmap_aliases() happens non-lazily. */ - if (area->flags & VM_GLOBAL_NONSENSITIVE) - asi_unmap(ASI_GLOBAL_NONSENSITIVE, area->addr, - get_vm_area_size(area), true); + if (area->flags & (VM_GLOBAL_NONSENSITIVE | VM_LOCAL_NONSENSITIVE)) + asi_unmap(area->asi, area->addr, get_vm_area_size(area), true); } =20 +#else + +static inline int asi_map_vm_area(struct vm_struct *area) { return 0; } +static inline void asi_unmap_vm_area(struct vm_struct *area) { } + +#endif + static inline void setup_vmalloc_vm_locked(struct vm_struct *vm, struct vmap_area *va, unsigned long flags, const void *caller) { @@ -2463,6 +2476,15 @@ static struct vm_struct *__get_vm_area_node(unsigned= long size, if (unlikely(!size)) return NULL; =20 + if (static_asi_enabled()) { + VM_BUG_ON((flags & VM_LOCAL_NONSENSITIVE) && + !(start >=3D VMALLOC_LOCAL_NONSENSITIVE_START && + end <=3D VMALLOC_LOCAL_NONSENSITIVE_END)); + + VM_BUG_ON((flags & VM_GLOBAL_NONSENSITIVE) && + start < VMALLOC_GLOBAL_NONSENSITIVE_START); + } + if (flags & VM_IOREMAP) align =3D 1ul << clamp_t(int, get_count_order_long(size), PAGE_SHIFT, IOREMAP_MAX_ORDER); @@ -3073,8 +3095,22 @@ void *__vmalloc_node_range(unsigned long size, unsig= ned long align, if (WARN_ON_ONCE(!size)) return NULL; =20 - if (static_asi_enabled() && (vm_flags & VM_GLOBAL_NONSENSITIVE)) - gfp_mask |=3D __GFP_ZERO; + if (static_asi_enabled()) { + VM_BUG_ON((vm_flags & (VM_LOCAL_NONSENSITIVE | + VM_GLOBAL_NONSENSITIVE)) =3D=3D + (VM_LOCAL_NONSENSITIVE | VM_GLOBAL_NONSENSITIVE)); + + if ((vm_flags & VM_LOCAL_NONSENSITIVE) && + !mm_asi_enabled(current->mm)) { + vm_flags &=3D ~VM_LOCAL_NONSENSITIVE; + + if (end =3D=3D VMALLOC_LOCAL_NONSENSITIVE_END) + end =3D VMALLOC_END; + } + + if (vm_flags & (VM_GLOBAL_NONSENSITIVE | VM_LOCAL_NONSENSITIVE)) + gfp_mask |=3D __GFP_ZERO; + } =20 if ((size >> PAGE_SHIFT) > totalram_pages()) { warn_alloc(gfp_mask, NULL, @@ -3166,11 +3202,19 @@ void *__vmalloc_node(unsigned long size, unsigned l= ong align, gfp_t gfp_mask, int node, const void *caller) { ulong vm_flags =3D 0; + ulong start =3D VMALLOC_START, end =3D VMALLOC_END; =20 - if (static_asi_enabled() && (gfp_mask & __GFP_GLOBAL_NONSENSITIVE)) - vm_flags |=3D VM_GLOBAL_NONSENSITIVE; + if (static_asi_enabled()) { + if (gfp_mask & __GFP_GLOBAL_NONSENSITIVE) { + vm_flags |=3D VM_GLOBAL_NONSENSITIVE; + start =3D VMALLOC_GLOBAL_NONSENSITIVE_START; + } else if (gfp_mask & __GFP_LOCAL_NONSENSITIVE) { + vm_flags |=3D VM_LOCAL_NONSENSITIVE; + end =3D VMALLOC_LOCAL_NONSENSITIVE_END; + } + } =20 - return __vmalloc_node_range(size, align, VMALLOC_START, VMALLOC_END, + return __vmalloc_node_range(size, align, start, end, gfp_mask, PAGE_KERNEL, vm_flags, node, caller); } /* @@ -3678,9 +3722,15 @@ struct vm_struct **pcpu_get_vm_areas(const unsigned = long *offsets, /* verify parameters and allocate data structures */ BUG_ON(offset_in_page(align) || !is_power_of_2(align)); =20 - if (static_asi_enabled() && (flags & VM_GLOBAL_NONSENSITIVE)) { - vmalloc_start =3D VMALLOC_GLOBAL_NONSENSITIVE_START; - vmalloc_end =3D VMALLOC_GLOBAL_NONSENSITIVE_END; + if (static_asi_enabled()) { + VM_BUG_ON((flags & (VM_LOCAL_NONSENSITIVE | + VM_GLOBAL_NONSENSITIVE)) =3D=3D + (VM_LOCAL_NONSENSITIVE | VM_GLOBAL_NONSENSITIVE)); + + if (flags & VM_GLOBAL_NONSENSITIVE) + vmalloc_start =3D VMALLOC_GLOBAL_NONSENSITIVE_START; + else if (flags & VM_LOCAL_NONSENSITIVE) + vmalloc_end =3D VMALLOC_LOCAL_NONSENSITIVE_END; } =20 vmalloc_start =3D ALIGN(vmalloc_start, align); --=20 2.35.1.473.g83b2b277ed-goog From nobody Fri Jun 26 13:44:01 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D173CC433EF for ; Wed, 23 Feb 2022 05:26:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229720AbiBWF0n (ORCPT ); Wed, 23 Feb 2022 00:26:43 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57700 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238320AbiBWFZl (ORCPT ); Wed, 23 Feb 2022 00:25:41 -0500 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A0C456D863 for ; Tue, 22 Feb 2022 21:24:46 -0800 (PST) Received: by mail-yb1-xb4a.google.com with SMTP id e129-20020a25d387000000b006245d830ca6so12323810ybf.13 for ; Tue, 22 Feb 2022 21:24:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=wpN/U5eluX6Jnat+pEDX2C0UlbFhsXsgqK2IMjW00B0=; b=eO68TZsayJUnzF9ppfHaPSwA9f3cpJtiM/bQv1NjIx1viksnomhVCZg/OPfesExCQe GPBpKhmjuCayGTZhMCQafKSmuYZ+smeOmFtUbvbkWS5siPQw7nd2sd5jds4KSpKgIMG1 5rAZMh+2rFOS1AHpWlupbprjH1FaaIPoFdMADMVIZrEncJu5IwxbAAD77jq17vk7nHI/ pHx7Yx3y5+Ja+7mdhxYJ7U+IEaib+spU25N1cAyqYIyEgycYL490w9sjOc0zBhbj55fl oXav/h9TUMbHXVfYpZ1MYXyKyDqxgCsBYAWNUi6HD47X/71tlKLWuj7QNFucy0Ul1WpD v86A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=wpN/U5eluX6Jnat+pEDX2C0UlbFhsXsgqK2IMjW00B0=; b=k2kUzrY4SYQ1HwTRTHRX/y3uVPy+8UlCLEeL4nxFjlWas06/df1Fl83rU7Hv8gKpAx MOul15VkOZw9ffwRfTqz6Ak0DusQgIN4ACmO/bS16ch+Xtb2G/1kqXFu6Gw4fp6qMLpC B0pBVExtvGkEEuVuLALsSgLdBYyEZ1RTW8VhXrQwAJ6hrtVdBf0uZpp6dnu6cU1Q4zWo gKSx4GzVf4yawcjfcCuRqkHbd5InP97qxxfs+8ax0B4Jw0pCfx9T7/GvkYQIHw/fl5MS SFUGyU4FCGVVTYTdRfadPbeh9nWmRy6yavrWpTQQ1d45GjF/3Brv3Ax6NNhQIFzUe0d7 sTBQ== X-Gm-Message-State: AOAM530cHnGssKhipa0WA2hrdzG1BTcJexjPlrLe7+dkaeMTyXbRo3AL 0G4uTJHSr14jGeVZEUHMlhs++AnCANEKWaBq8KNzbz2P+G40+ZBu0HtX8l6yR74azszgzNnFpl1 JdA3ImlrWGbTJUFPF9d3DhyC//20tj9U08kjgZkcNkrIojCvYxf6jOb8baEVUjlXwPdrE6x4B X-Google-Smtp-Source: ABdhPJy8YBcGGLS1CDa1jQAvWHy6IX+C9Fx9N9t0vFRpgYOq3ptmbXD9mdu9QbdYBQ3ESjXtAc+9DWkF+HCP X-Received: from js-desktop.svl.corp.google.com ([2620:15c:2cd:202:ccbe:5d15:e2e6:322]) (user=junaids job=sendgmr) by 2002:a81:7141:0:b0:2d3:d549:23f8 with SMTP id m62-20020a817141000000b002d3d54923f8mr27573261ywc.87.1645593872454; Tue, 22 Feb 2022 21:24:32 -0800 (PST) Date: Tue, 22 Feb 2022 21:21:57 -0800 In-Reply-To: <20220223052223.1202152-1-junaids@google.com> Message-Id: <20220223052223.1202152-22-junaids@google.com> Mime-Version: 1.0 References: <20220223052223.1202152-1-junaids@google.com> X-Mailer: git-send-email 2.35.1.473.g83b2b277ed-goog Subject: [RFC PATCH 21/47] mm: asi: Add support for locally non-sensitive VM_USERMAP pages From: Junaid Shahid To: linux-kernel@vger.kernel.org Cc: kvm@vger.kernel.org, pbonzini@redhat.com, jmattson@google.com, pjt@google.com, oweisse@google.com, alexandre.chartre@oracle.com, rppt@linux.ibm.com, dave.hansen@linux.intel.com, peterz@infradead.org, tglx@linutronix.de, luto@kernel.org, linux-mm@kvack.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" VM_USERMAP pages can be mapped into userspace, which would overwrite the asi_mm field, so we restore that field when freeing these pages. Signed-off-by: Junaid Shahid --- mm/vmalloc.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/mm/vmalloc.c b/mm/vmalloc.c index ea94d8a1e2e9..a89866a926f6 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -2424,6 +2424,14 @@ static void asi_unmap_vm_area(struct vm_struct *area) */ if (area->flags & (VM_GLOBAL_NONSENSITIVE | VM_LOCAL_NONSENSITIVE)) asi_unmap(area->asi, area->addr, get_vm_area_size(area), true); + + if (area->flags & VM_USERMAP) { + uint i; + + for (i =3D 0; i < area->nr_pages; i++) + if (PageLocalNonSensitive(area->pages[i])) + area->pages[i]->asi_mm =3D area->asi->mm; + } } =20 #else --=20 2.35.1.473.g83b2b277ed-goog From nobody Fri Jun 26 13:44:01 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 59A3FC433EF for ; Wed, 23 Feb 2022 05:26:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238337AbiBWF0s (ORCPT ); Wed, 23 Feb 2022 00:26:48 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57808 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238330AbiBWFZm (ORCPT ); Wed, 23 Feb 2022 00:25:42 -0500 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EC4646D953 for ; Tue, 22 Feb 2022 21:24:48 -0800 (PST) Received: by mail-yb1-xb4a.google.com with SMTP id i205-20020a2522d6000000b00622c778ac7cso26758116ybi.3 for ; Tue, 22 Feb 2022 21:24:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=eHvI/S5CbFiuxrTOsf9OKECf8MKQA40+sVHVsgBWXTI=; b=V0N2AQ61NCzV6BZmqiQxUYs5pGX63pZ6VVRWlptLLGQUiVMxzlkVsCzNfrYnsMXVug KGkm/PYQEu9If/h+TwyTP773A843C1vccvswNpzQzC+CWF0lyH9rv4lYVpwygBHxjjxF KtfQ48TD03/CH3MJc6uQdii7U/orjcd6yROvM49vbkxsBatJT7hHIeSSgYOHhBpNRFtZ ikDCPeVq4k3/1dGPRt9rMDvqd7qp2qvlw0rdUxHJz6cMzz6PpmzwFKFFGSl65Hota3P/ hcaYP68KMD05XrecJbTtx+liHxuoMDRkdxyvRDaTUelX+R7q43SiNNrm4qp+y/niXgLp +m3g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=eHvI/S5CbFiuxrTOsf9OKECf8MKQA40+sVHVsgBWXTI=; b=t/0XFCQpQntOWvt0UpdJ0ZxaqKnq2n+1Hu0sEBqQGyfC14DYrht2kNdgyWC13JJIt9 iWm+MX2EiL/sfeNFrceTh1afSlfbeKtDSUUjW9j+t0MOeNguuirlUnYuajsXHDMOax2d io9DWdS7wy4J7420iGzgT1Lc2IZs3rwopf9x20QTKeyqMXzaq7ki1HL608n92hAhlN+s CpRbuN9Air5RKwYm8aXW5u8MhfeD4d3zEZNq1OtSd9aQgx8JJwC8qEtb0I3oQIcmKCPe 6Foo9GQOwZga5qWxlZYwvb/vx4Mb/K0BUHMjS1Mkiy+CdDslXNuxn0fMG1qzUxLXN7yq bjQg== X-Gm-Message-State: AOAM531ijgeESWaFaNVBySC8+dB077H0FdIRZclQp1XFdBW7anJkLbLr te9zd0xC2hEohnjtF3oAxAmChbEuuRL4fua/VBvqKdojroUKDMngCUL37B9A+3FzInbtwOl5BQ4 p53XlT7haBcpWe24QvxSbdURxBXHpAvB5qNOU+Pv0H5dYLKK2/dp8XcBKK12ppjPGNmXTuzgX X-Google-Smtp-Source: ABdhPJylISnqnK8ITQ9nblGksWK/PwAayD9axmVlFF61BCPqFWoHn9afvvtl0MNwA3JR60UblOXBCmfjIzzy X-Received: from js-desktop.svl.corp.google.com ([2620:15c:2cd:202:ccbe:5d15:e2e6:322]) (user=junaids job=sendgmr) by 2002:a81:7951:0:b0:2d6:b7bf:216a with SMTP id u78-20020a817951000000b002d6b7bf216amr24436525ywc.258.1645593874907; Tue, 22 Feb 2022 21:24:34 -0800 (PST) Date: Tue, 22 Feb 2022 21:21:58 -0800 In-Reply-To: <20220223052223.1202152-1-junaids@google.com> Message-Id: <20220223052223.1202152-23-junaids@google.com> Mime-Version: 1.0 References: <20220223052223.1202152-1-junaids@google.com> X-Mailer: git-send-email 2.35.1.473.g83b2b277ed-goog Subject: [RFC PATCH 22/47] mm: asi: Added refcounting when initilizing an asi From: Junaid Shahid To: linux-kernel@vger.kernel.org Cc: Ofir Weisse , kvm@vger.kernel.org, pbonzini@redhat.com, jmattson@google.com, pjt@google.com, alexandre.chartre@oracle.com, rppt@linux.ibm.com, dave.hansen@linux.intel.com, peterz@infradead.org, tglx@linutronix.de, luto@kernel.org, linux-mm@kvack.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Ofir Weisse Some KVM tests initilize multiple VMs in a single process. For these cases, we want to suppurt multiple callse to asi_init() before a single asi_destroy is called. We want the initilization to happen exactly once. IF asi_destroy() is called, release the resources only if the counter reached zero. In our current implementation, asi's are tied to a specific mm. This may change in a future implementation. In which case, the mutex for the refcounting will need to move to struct asi. Signed-off-by: Ofir Weisse --- arch/x86/include/asm/asi.h | 1 + arch/x86/mm/asi.c | 52 +++++++++++++++++++++++++++++++++----- include/linux/mm_types.h | 2 ++ kernel/fork.c | 3 +++ 4 files changed, 51 insertions(+), 7 deletions(-) diff --git a/arch/x86/include/asm/asi.h b/arch/x86/include/asm/asi.h index e3cbf6d8801e..2dc465f78bcc 100644 --- a/arch/x86/include/asm/asi.h +++ b/arch/x86/include/asm/asi.h @@ -40,6 +40,7 @@ struct asi { pgd_t *pgd; struct asi_class *class; struct mm_struct *mm; + int64_t asi_ref_count; }; =20 DECLARE_PER_CPU_ALIGNED(struct asi_state, asi_cpu_state); diff --git a/arch/x86/mm/asi.c b/arch/x86/mm/asi.c index 91e5ff1224ff..ac35323193a3 100644 --- a/arch/x86/mm/asi.c +++ b/arch/x86/mm/asi.c @@ -282,9 +282,25 @@ static int __init asi_global_init(void) } subsys_initcall(asi_global_init) =20 +/* We're assuming we hold mm->asi_init_lock */ +static void __asi_destroy(struct asi *asi) +{ + if (!boot_cpu_has(X86_FEATURE_ASI)) + return; + + /* If refcount is non-zero, it means asi_init() was called multiple + * times. We free the asi pgd only when the last VM is destroyed. = */ + if (--(asi->asi_ref_count) > 0) + return; + + asi_free_pgd(asi); + memset(asi, 0, sizeof(struct asi)); +} + int asi_init(struct mm_struct *mm, int asi_index, struct asi **out_asi) { - struct asi *asi =3D &mm->asi[asi_index]; + int err =3D 0; + struct asi *asi =3D &mm->asi[asi_index]; =20 *out_asi =3D NULL; =20 @@ -295,6 +311,15 @@ int asi_init(struct mm_struct *mm, int asi_index, stru= ct asi **out_asi) WARN_ON(asi_index =3D=3D 0 || asi_index >=3D ASI_MAX_NUM); WARN_ON(asi->pgd !=3D NULL); =20 + /* Currently, mm and asi structs are conceptually tied together. In + * future implementations an asi object might be unrelated to a sp= ecicic + * mm. In that future implementation - the mutex will have to be i= nside + * asi. */ + mutex_lock(&mm->asi_init_lock); + + if (asi->asi_ref_count++ > 0) + goto exit_unlock; /* err is 0 */ + /* * For now, we allocate 2 pages to avoid any potential problems with * KPTI code. This won't be needed once KPTI is folded into the ASI @@ -302,8 +327,10 @@ int asi_init(struct mm_struct *mm, int asi_index, stru= ct asi **out_asi) */ asi->pgd =3D (pgd_t *)__get_free_pages(GFP_PGTABLE_USER, PGD_ALLOCATION_ORDER); - if (!asi->pgd) - return -ENOMEM; + if (!asi->pgd) { + err =3D -ENOMEM; + goto exit_unlock; + } =20 asi->class =3D &asi_class[asi_index]; asi->mm =3D mm; @@ -328,19 +355,30 @@ int asi_init(struct mm_struct *mm, int asi_index, str= uct asi **out_asi) set_pgd(asi->pgd + i, asi_global_nonsensitive_pgd[i]); } =20 - *out_asi =3D asi; +exit_unlock: + if (err) + __asi_destroy(asi); =20 - return 0; + /* This unlock signals future asi_init() callers that we finished.= */ + mutex_unlock(&mm->asi_init_lock); + + if (!err) + *out_asi =3D asi; + return err; } EXPORT_SYMBOL_GPL(asi_init); =20 void asi_destroy(struct asi *asi) { + struct mm_struct *mm; + if (!boot_cpu_has(X86_FEATURE_ASI) || !asi) return; =20 - asi_free_pgd(asi); - memset(asi, 0, sizeof(struct asi)); + mm =3D asi->mm; + mutex_lock(&mm->asi_init_lock); + __asi_destroy(asi); + mutex_unlock(&mm->asi_init_lock); } EXPORT_SYMBOL_GPL(asi_destroy); =20 diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index f9702d070975..e6980ae31323 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -16,6 +16,7 @@ #include #include #include +#include =20 #include #include @@ -628,6 +629,7 @@ struct mm_struct { * these resources for every mm in the system, we expect t= hat * only VM mm's will have this flag set. */ bool asi_enabled; + struct mutex asi_init_lock; #endif struct user_namespace *user_ns; =20 diff --git a/kernel/fork.c b/kernel/fork.c index dd5a86e913ea..68b3aeab55ac 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -1084,6 +1084,9 @@ static struct mm_struct *mm_init(struct mm_struct *mm= , struct task_struct *p, =20 mm->user_ns =3D get_user_ns(user_ns); =20 +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + mutex_init(&mm->asi_init_lock); +#endif return mm; =20 fail_noasi: --=20 2.35.1.473.g83b2b277ed-goog From nobody Fri Jun 26 13:44:01 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 12519C433FE for ; Wed, 23 Feb 2022 05:26:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238332AbiBWF0g (ORCPT ); Wed, 23 Feb 2022 00:26:36 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57636 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238331AbiBWFZm (ORCPT ); Wed, 23 Feb 2022 00:25:42 -0500 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F27386D964 for ; Tue, 22 Feb 2022 21:24:49 -0800 (PST) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-2d74a0ff060so75803747b3.6 for ; Tue, 22 Feb 2022 21:24:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=migmFG7UQDaW8MeCncNCopIOLc7/KCi4nAx8Y23Kqls=; b=ZIf827+EcfBkAYznkNAzIxjepuJOo5jGJUpKJJAWVlbJLp/v2xx9WndG0Zfv0Y30pp QVoiE4x5/isYQS6r529V5FuPTwvJXepA8/Ig/UwtVbwxtSPhO5eBWW/xZu7GdKR2hwRD noJsJ3Ur8bMBMS1eL71R7PXbr8K1C4T4AmgEfBVK5/+rkrUOvlO+o5qBs5UJm8vfEFH8 iWvYMGLx9ZnbGLPu3ZtgRIXCqWDVhmvwLhi1ORJZdwX1PxFlhSsfJbPaCpw4raGaKYCV 43iNhXkj/THFgjjQb9mrwbCVq0tdVwLBhSI29IV5lijsBJ9zHJayAmn0j12b9dgg/i30 Q+4w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=migmFG7UQDaW8MeCncNCopIOLc7/KCi4nAx8Y23Kqls=; b=tyC8PEAm+nHdFxM9SbddE+nmk4SbGQTtDT7y9ENUnSRRXXPJs6PCbuIAb9OLtbrUR1 JkvZy1y1Jz+WzRsmmvLQejtn6WtQDD1Ub+zUrtB73Ro9EPj8UVIt6HcfQ1zLfjQrMYAG r6iYGN1y0y8pCTZp/img5Lk9X9PZF6tRv5xoBhYQwfqARBbYEpOnhnfoppWtOGWRoTIA uCu2z5FoR1JCMOBPT0GAfc/EYsEPm5rihp+thouE7U+3LaCMmwnOHyvqtpytXam+V1xF 7AWvibOFH2s/MiDESyesvKIvuyNI6ElYCmvxg25TSYLfxeb4XPIV9LovQYPvSoiJxdpN /tKg== X-Gm-Message-State: AOAM533I7XGRq/O6ns+TFI5J6Q/CqbRYi1m53s7hoBdGr9lPTeZRWyty SdwEgo0/orVTVoVjdRUL01ChilgrCUCQb6SJDmxxZaRLRY01r9N0W51pkKu7YCkz5XV8+lFe5Lq AmBJGIrjXHhT347RCfWmUKCT8S9i+0PoiiQc8crV27qw2+TFcTQ8OOvXUBlAvJyvO2HmVrjHc X-Google-Smtp-Source: ABdhPJzpSBL0VJdokvTlUTI5d1w3wFKyOwiaPENifHATjNeM+qviCG3MD0z5nH5EeDzE1oVe+ERWkd3RD0Jr X-Received: from js-desktop.svl.corp.google.com ([2620:15c:2cd:202:ccbe:5d15:e2e6:322]) (user=junaids job=sendgmr) by 2002:a81:84d5:0:b0:2d1:e85:bf04 with SMTP id u204-20020a8184d5000000b002d10e85bf04mr27926930ywf.465.1645593877093; Tue, 22 Feb 2022 21:24:37 -0800 (PST) Date: Tue, 22 Feb 2022 21:21:59 -0800 In-Reply-To: <20220223052223.1202152-1-junaids@google.com> Message-Id: <20220223052223.1202152-24-junaids@google.com> Mime-Version: 1.0 References: <20220223052223.1202152-1-junaids@google.com> X-Mailer: git-send-email 2.35.1.473.g83b2b277ed-goog Subject: [RFC PATCH 23/47] mm: asi: Add support for mapping all userspace memory into ASI From: Junaid Shahid To: linux-kernel@vger.kernel.org Cc: kvm@vger.kernel.org, pbonzini@redhat.com, jmattson@google.com, pjt@google.com, oweisse@google.com, alexandre.chartre@oracle.com, rppt@linux.ibm.com, dave.hansen@linux.intel.com, peterz@infradead.org, tglx@linutronix.de, luto@kernel.org, linux-mm@kvack.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This adds a new ASI class flag, ASI_MAP_ALL_USERSPACE, which if set, would automatically map all userspace addresses into that ASI address space. This is achieved by lazily cloning the userspace PGD entries during page faults encountered while in that restricted address space. When the userspace PGD entry is cleared (e.g. in munmap()), we go through all restricted address spaces with the ASI_MAP_ALL_USERSPACE flag and clear the corresponding entry in those address spaces as well. Signed-off-by: Junaid Shahid --- arch/x86/include/asm/asi.h | 2 + arch/x86/mm/asi.c | 81 ++++++++++++++++++++++++++++++++++++++ include/asm-generic/asi.h | 7 ++++ mm/memory.c | 2 + 4 files changed, 92 insertions(+) diff --git a/arch/x86/include/asm/asi.h b/arch/x86/include/asm/asi.h index 2dc465f78bcc..062ccac07fd9 100644 --- a/arch/x86/include/asm/asi.h +++ b/arch/x86/include/asm/asi.h @@ -68,6 +68,8 @@ void asi_unmap(struct asi *asi, void *addr, size_t len, b= ool flush_tlb); void asi_flush_tlb_range(struct asi *asi, void *addr, size_t len); void asi_sync_mapping(struct asi *asi, void *addr, size_t len); void asi_do_lazy_map(struct asi *asi, size_t addr); +void asi_clear_user_pgd(struct mm_struct *mm, size_t addr); +void asi_clear_user_p4d(struct mm_struct *mm, size_t addr); =20 static inline void asi_init_thread_state(struct thread_struct *thread) { diff --git a/arch/x86/mm/asi.c b/arch/x86/mm/asi.c index ac35323193a3..a3d96be76fa9 100644 --- a/arch/x86/mm/asi.c +++ b/arch/x86/mm/asi.c @@ -702,6 +702,41 @@ static bool is_addr_in_local_nonsensitive_range(size_t= addr) addr < VMALLOC_GLOBAL_NONSENSITIVE_START; } =20 +static void asi_clone_user_pgd(struct asi *asi, size_t addr) +{ + pgd_t *src =3D pgd_offset_pgd(asi->mm->pgd, addr); + pgd_t *dst =3D pgd_offset_pgd(asi->pgd, addr); + pgdval_t old_src, curr_src; + + if (pgd_val(*dst)) + return; + + VM_BUG_ON(!irqs_disabled()); + + /* + * This synchronizes against the PGD entry getting cleared by + * free_pgd_range(). That path has the following steps: + * 1. pgd_clear + * 2. asi_clear_user_pgd + * 3. Remote TLB Flush + * 4. Free page tables + * + * (3) will be blocked for the duration of this function because the + * IPI will remain pending until interrupts are re-enabled. + * + * The following loop ensures that if we read the PGD value before + * (1) and write it after (2), we will re-read the value and write + * the new updated value. + */ + curr_src =3D pgd_val(*src); + do { + set_pgd(dst, __pgd(curr_src)); + smp_mb(); + old_src =3D curr_src; + curr_src =3D pgd_val(*src); + } while (old_src !=3D curr_src); +} + void asi_do_lazy_map(struct asi *asi, size_t addr) { if (!static_cpu_has(X86_FEATURE_ASI) || !asi) @@ -710,6 +745,9 @@ void asi_do_lazy_map(struct asi *asi, size_t addr) if ((asi->class->flags & ASI_MAP_STANDARD_NONSENSITIVE) && is_addr_in_local_nonsensitive_range(addr)) asi_clone_pgd(asi->pgd, asi->mm->asi[0].pgd, addr); + else if ((asi->class->flags & ASI_MAP_ALL_USERSPACE) && + addr < TASK_SIZE_MAX) + asi_clone_user_pgd(asi, addr); } =20 /* @@ -766,3 +804,46 @@ void __init asi_vmalloc_init(void) VM_BUG_ON(vmalloc_local_nonsensitive_end >=3D VMALLOC_END); VM_BUG_ON(vmalloc_global_nonsensitive_start <=3D VMALLOC_START); } + +static void __asi_clear_user_pgd(struct mm_struct *mm, size_t addr) +{ + uint i; + + if (!static_cpu_has(X86_FEATURE_ASI) || !mm_asi_enabled(mm)) + return; + + /* + * This function is called right after pgd_clear/p4d_clear. + * We need to be sure that the preceding pXd_clear is visible before + * the ASI pgd clears below. Compare with asi_clone_user_pgd(). + */ + smp_mb__before_atomic(); + + /* + * We need to ensure that the ASI PGD tables do not get freed from + * under us. We can potentially use RCU to avoid that, but since + * this path is probably not going to be too performance sensitive, + * so we just acquire the lock to block asi_destroy(). + */ + mutex_lock(&mm->asi_init_lock); + + for (i =3D 1; i < ASI_MAX_NUM; i++) + if (mm->asi[i].class && + (mm->asi[i].class->flags & ASI_MAP_ALL_USERSPACE)) + set_pgd(pgd_offset_pgd(mm->asi[i].pgd, addr), + native_make_pgd(0)); + + mutex_unlock(&mm->asi_init_lock); +} + +void asi_clear_user_pgd(struct mm_struct *mm, size_t addr) +{ + if (pgtable_l5_enabled()) + __asi_clear_user_pgd(mm, addr); +} + +void asi_clear_user_p4d(struct mm_struct *mm, size_t addr) +{ + if (!pgtable_l5_enabled()) + __asi_clear_user_pgd(mm, addr); +} diff --git a/include/asm-generic/asi.h b/include/asm-generic/asi.h index 7c50d8b64fa4..8513d0d7865a 100644 --- a/include/asm-generic/asi.h +++ b/include/asm-generic/asi.h @@ -6,6 +6,7 @@ =20 /* ASI class flags */ #define ASI_MAP_STANDARD_NONSENSITIVE 1 +#define ASI_MAP_ALL_USERSPACE 2 =20 #ifndef CONFIG_ADDRESS_SPACE_ISOLATION =20 @@ -85,6 +86,12 @@ void asi_unmap(struct asi *asi, void *addr, size_t len, = bool flush_tlb) { } static inline void asi_do_lazy_map(struct asi *asi, size_t addr) { } =20 +static inline +void asi_clear_user_pgd(struct mm_struct *mm, size_t addr) { } + +static inline +void asi_clear_user_p4d(struct mm_struct *mm, size_t addr) { } + static inline void asi_flush_tlb_range(struct asi *asi, void *addr, size_t len) { } =20 diff --git a/mm/memory.c b/mm/memory.c index 8f1de811a1dc..667ece86e051 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -296,6 +296,7 @@ static inline void free_pud_range(struct mmu_gather *tl= b, p4d_t *p4d, =20 pud =3D pud_offset(p4d, start); p4d_clear(p4d); + asi_clear_user_p4d(tlb->mm, start); pud_free_tlb(tlb, pud, start); mm_dec_nr_puds(tlb->mm); } @@ -330,6 +331,7 @@ static inline void free_p4d_range(struct mmu_gather *tl= b, pgd_t *pgd, =20 p4d =3D p4d_offset(pgd, start); pgd_clear(pgd); + asi_clear_user_pgd(tlb->mm, start); p4d_free_tlb(tlb, p4d, start); } =20 --=20 2.35.1.473.g83b2b277ed-goog From nobody Fri Jun 26 13:44:01 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8B835C433FE for ; Wed, 23 Feb 2022 05:26:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238301AbiBWF1B (ORCPT ); Wed, 23 Feb 2022 00:27:01 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58744 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238241AbiBWF0E (ORCPT ); Wed, 23 Feb 2022 00:26:04 -0500 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 064A16E298 for ; Tue, 22 Feb 2022 21:24:52 -0800 (PST) Received: by mail-yb1-xb4a.google.com with SMTP id a19-20020a25ca13000000b0061db44646b3so26599575ybg.2 for ; Tue, 22 Feb 2022 21:24:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=ztkGsqUqkQcfbxlBdMnsFZA5zf9U5+9bkMDZbcQFZD8=; b=n78DK1+xw1pRWsJW1CFG6TQ6Gavlq4ZQ5W8oD9M2VCMnu8PrU3OyRhoo6bkFuFB9Bp w2aijS13h+Op8okZ6CS3EIZUB1IYhktNxmLMznN4orFxlVaThvl2eVvRB8zbVhIFGOvq jpsqzPo5HOiq6C+toVkqrWMJLKA9J7cIFPDVog0m9lqNblUfFa8H0hzg7NsV5GaTgQ4A 5jBI/lIXeOUGWHVKqN95Y8V5NuPIznfs6f5wBqsu6+GDk2tYuPnXsBHJNOaqpE8YL9hS tuNt6ye+jTS2wLjdTDudRIQlCeKYaeA1rt8TrSn8xLY0Ts3RiaaKU5y9wgWJ9NQ/XyT5 wAiQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=ztkGsqUqkQcfbxlBdMnsFZA5zf9U5+9bkMDZbcQFZD8=; b=Of9eW+AsWER3F4br3jVWy/Alz8KBXVgFlbmLAnzdOAT4d8j2gg1itms4sXzc6ENPhQ jNXr9QZJls3yKXaEdTGUvvshlOfMnlBRlEvQGQ4ngfR9HExLYLjBG0+m3zuchefkDAo5 hwLLB4saz2L1DDWKyqkrjXvI9BvHEOY7dvaECfF5hBgCtcvcORDjqr14vv5VYQcNc5lK vy/fZWZSnrH7i6hgRV6Lt9HFGGuXTbxZp/RANXDM6+EQvlbbNNws2UCvhN781GdGDmrS /wnl4jsYDzEb74qgLWk35IrwyLIEhztUJx6XZ2XTQlpus+OttWPZKzVd5yfetSPVliD/ 2YgQ== X-Gm-Message-State: AOAM531ZM+mZTZaHjVzbtZhcCRYsBNJrvMGd2r3agC7N69atA3mkTfbC CsxXdZ+yC+isZ2oDsM4O260f1FBnRIqZTI4S42J+hO+IBR9Go6XFAtLWhtLEN2Ntts1+1i9rIj7 yTYBrCoqULf9DC3pReE3K8NjVlK0RbJVo18epd2q3dWanIA28ZdKaA6nYkPE34vNloZbZjzwE X-Google-Smtp-Source: ABdhPJzcVkgi3JMg9FekP/jbB0HbpE3UWWhA+GGvHutupthetmKemwuv8RCt3TmZZl6hj5AyHVL2xNuWi5Ck X-Received: from js-desktop.svl.corp.google.com ([2620:15c:2cd:202:ccbe:5d15:e2e6:322]) (user=junaids job=sendgmr) by 2002:a25:7694:0:b0:624:a2d9:c8f0 with SMTP id r142-20020a257694000000b00624a2d9c8f0mr10070639ybc.523.1645593879400; Tue, 22 Feb 2022 21:24:39 -0800 (PST) Date: Tue, 22 Feb 2022 21:22:00 -0800 In-Reply-To: <20220223052223.1202152-1-junaids@google.com> Message-Id: <20220223052223.1202152-25-junaids@google.com> Mime-Version: 1.0 References: <20220223052223.1202152-1-junaids@google.com> X-Mailer: git-send-email 2.35.1.473.g83b2b277ed-goog Subject: [RFC PATCH 24/47] mm: asi: Support for local non-sensitive slab caches From: Junaid Shahid To: linux-kernel@vger.kernel.org Cc: kvm@vger.kernel.org, pbonzini@redhat.com, jmattson@google.com, pjt@google.com, oweisse@google.com, alexandre.chartre@oracle.com, rppt@linux.ibm.com, dave.hansen@linux.intel.com, peterz@infradead.org, tglx@linutronix.de, luto@kernel.org, linux-mm@kvack.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" A new flag SLAB_LOCAL_NONSENSITIVE is added to designate that a slab cache can be used for local non-sensitive allocations. For such caches, a per-process child cache will be created when a process tries to make an allocation from that cache for the first time, similar to the per-memcg child caches that used to exist before the object based memcg charging mechanism. (A lot of the infrastructure for handling these child caches is derived from the original per-memcg cache code). If a cache only has SLAB_LOCAL_NONSENSITIVE, then all allocations from that cache will automatically be considered locally non-sensitive. But if a cache has both SLAB_LOCAL_NONSENSITIVE and SLAB_GLOBAL_NONSENSITIVE, then each allocation must specify one of __GFP_LOCAL_NONSENSITIVE or __GFP_GLOBAL_NONSENSITIVE. Note that the first locally non-sensitive allocation that a process makes from a given slab cache must occur from a sleepable context. If that is not the case, then a new kmem_cache_precreate_local* API must be called from a sleepable context before the first allocation. Signed-off-by: Junaid Shahid --- arch/x86/mm/asi.c | 5 + include/linux/mm_types.h | 4 + include/linux/sched/mm.h | 12 ++ include/linux/slab.h | 38 +++- include/linux/slab_def.h | 4 + kernel/fork.c | 3 +- mm/slab.c | 41 ++++- mm/slab.h | 151 +++++++++++++++- mm/slab_common.c | 363 ++++++++++++++++++++++++++++++++++++++- 9 files changed, 602 insertions(+), 19 deletions(-) diff --git a/arch/x86/mm/asi.c b/arch/x86/mm/asi.c index a3d96be76fa9..6b9a0f5ab391 100644 --- a/arch/x86/mm/asi.c +++ b/arch/x86/mm/asi.c @@ -4,6 +4,7 @@ #include #include #include +#include =20 #include #include @@ -455,6 +456,8 @@ int asi_init_mm_state(struct mm_struct *mm) =20 memset(mm->asi, 0, sizeof(mm->asi)); mm->asi_enabled =3D false; + RCU_INIT_POINTER(mm->local_slab_caches, NULL); + mm->local_slab_caches_array_size =3D 0; =20 /* * TODO: In addition to a cgroup flag, we may also want a per-process @@ -482,6 +485,8 @@ void asi_free_mm_state(struct mm_struct *mm) if (!boot_cpu_has(X86_FEATURE_ASI) || !mm->asi_enabled) return; =20 + free_local_slab_caches(mm); + asi_free_pgd_range(&mm->asi[0], pgd_index(ASI_LOCAL_MAP), pgd_index(ASI_LOCAL_MAP + PFN_PHYS(max_possible_pfn)) + 1); diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index e6980ae31323..56511adc263e 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -517,6 +517,10 @@ struct mm_struct { =20 struct asi asi[ASI_MAX_NUM]; =20 +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + struct kmem_cache * __rcu *local_slab_caches; + uint local_slab_caches_array_size; +#endif /** * @mm_users: The number of users including userspace. * diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h index aca874d33fe6..c9122d4436d4 100644 --- a/include/linux/sched/mm.h +++ b/include/linux/sched/mm.h @@ -37,9 +37,21 @@ static inline void mmgrab(struct mm_struct *mm) } =20 extern void __mmdrop(struct mm_struct *mm); +extern void mmdrop_async(struct mm_struct *mm); =20 static inline void mmdrop(struct mm_struct *mm) { +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + /* + * We really only need to do this if we are in an atomic context. + * Unfortunately, there doesn't seem to be a reliable way to detect + * atomic context across all kernel configs. So we just always do async. + */ + if (rcu_access_pointer(mm->local_slab_caches)) { + mmdrop_async(mm); + return; + } +#endif /* * The implicit full barrier implied by atomic_dec_and_test() is * required by the membarrier system call before returning to diff --git a/include/linux/slab.h b/include/linux/slab.h index 7b8a3853d827..ef9c73c0d874 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -93,6 +93,8 @@ /* Avoid kmemleak tracing */ #define SLAB_NOLEAKTRACE ((slab_flags_t __force)0x00800000U) =20 +/* 0x01000000U is used below for SLAB_LOCAL_NONSENSITIVE */ + /* Fault injection mark */ #ifdef CONFIG_FAILSLAB # define SLAB_FAILSLAB ((slab_flags_t __force)0x02000000U) @@ -121,8 +123,10 @@ #define SLAB_DEACTIVATED ((slab_flags_t __force)0x10000000U) =20 #ifdef CONFIG_ADDRESS_SPACE_ISOLATION +#define SLAB_LOCAL_NONSENSITIVE ((slab_flags_t __force)0x01000000U) #define SLAB_GLOBAL_NONSENSITIVE ((slab_flags_t __force)0x20000000U) #else +#define SLAB_LOCAL_NONSENSITIVE 0 #define SLAB_GLOBAL_NONSENSITIVE 0 #endif =20 @@ -377,7 +381,8 @@ static __always_inline struct kmem_cache *get_kmalloc_c= ache(gfp_t flags, { #ifdef CONFIG_ADDRESS_SPACE_ISOLATION =20 - if (static_asi_enabled() && (flags & __GFP_GLOBAL_NONSENSITIVE)) + if (static_asi_enabled() && + (flags & (__GFP_GLOBAL_NONSENSITIVE | __GFP_LOCAL_NONSENSITIVE))) return nonsensitive_kmalloc_caches[kmalloc_type(flags)][index]; #endif return kmalloc_caches[kmalloc_type(flags)][index]; @@ -800,4 +805,35 @@ int slab_dead_cpu(unsigned int cpu); #define slab_dead_cpu NULL #endif =20 +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + +struct kmem_cache *get_local_kmem_cache(struct kmem_cache *s, + struct mm_struct *mm, gfp_t flags); +void free_local_slab_caches(struct mm_struct *mm); +int kmem_cache_precreate_local(struct kmem_cache *s); +int kmem_cache_precreate_local_kmalloc(size_t size, gfp_t flags); + +#else + +static inline +struct kmem_cache *get_local_kmem_cache(struct kmem_cache *s, + struct mm_struct *mm, gfp_t flags) +{ + return NULL; +} + +static inline void free_local_slab_caches(struct mm_struct *mm) { } + +static inline int kmem_cache_precreate_local(struct kmem_cache *s) +{ + return 0; +} + +static inline int kmem_cache_precreate_local_kmalloc(size_t size, gfp_t fl= ags) +{ + return 0; +} + +#endif + #endif /* _LINUX_SLAB_H */ diff --git a/include/linux/slab_def.h b/include/linux/slab_def.h index 3aa5e1e73ab6..53cbc1f40031 100644 --- a/include/linux/slab_def.h +++ b/include/linux/slab_def.h @@ -81,6 +81,10 @@ struct kmem_cache { unsigned int *random_seq; #endif =20 +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + struct kmem_local_cache_info local_cache_info; +#endif + unsigned int useroffset; /* Usercopy region offset */ unsigned int usersize; /* Usercopy region size */ =20 diff --git a/kernel/fork.c b/kernel/fork.c index 68b3aeab55ac..d7f55de00947 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -714,13 +714,14 @@ static void mmdrop_async_fn(struct work_struct *work) __mmdrop(mm); } =20 -static void mmdrop_async(struct mm_struct *mm) +void mmdrop_async(struct mm_struct *mm) { if (unlikely(atomic_dec_and_test(&mm->mm_count))) { INIT_WORK(&mm->async_put_work, mmdrop_async_fn); schedule_work(&mm->async_put_work); } } +EXPORT_SYMBOL(mmdrop_async); =20 static inline void free_signal_struct(struct signal_struct *sig) { diff --git a/mm/slab.c b/mm/slab.c index 5a928d95d67b..44cf6d127a4c 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -1403,6 +1403,8 @@ static void kmem_freepages(struct kmem_cache *cachep,= struct page *page) /* In union with page->mapping where page allocator expects NULL */ page->slab_cache =3D NULL; =20 + restore_page_nonsensitive_metadata(page, cachep); + if (current->reclaim_state) current->reclaim_state->reclaimed_slab +=3D 1 << order; unaccount_slab_page(page, order, cachep); @@ -2061,11 +2063,9 @@ int __kmem_cache_create(struct kmem_cache *cachep, s= lab_flags_t flags) cachep->allocflags |=3D GFP_DMA32; if (flags & SLAB_RECLAIM_ACCOUNT) cachep->allocflags |=3D __GFP_RECLAIMABLE; - if (flags & SLAB_GLOBAL_NONSENSITIVE) - cachep->allocflags |=3D __GFP_GLOBAL_NONSENSITIVE; cachep->size =3D size; cachep->reciprocal_buffer_size =3D reciprocal_value(size); - + set_nonsensitive_cache_params(cachep); #if DEBUG /* * If we're going to use the generic kernel_map_pages() @@ -3846,8 +3846,8 @@ static int setup_kmem_cache_nodes(struct kmem_cache *= cachep, gfp_t gfp) } =20 /* Always called with the slab_mutex held */ -static int do_tune_cpucache(struct kmem_cache *cachep, int limit, - int batchcount, int shared, gfp_t gfp) +static int __do_tune_cpucache(struct kmem_cache *cachep, int limit, + int batchcount, int shared, gfp_t gfp) { struct array_cache __percpu *cpu_cache, *prev; int cpu; @@ -3892,6 +3892,29 @@ static int do_tune_cpucache(struct kmem_cache *cache= p, int limit, return setup_kmem_cache_nodes(cachep, gfp); } =20 +static int do_tune_cpucache(struct kmem_cache *cachep, int limit, + int batchcount, int shared, gfp_t gfp) +{ + int ret; + struct kmem_cache *c; + + ret =3D __do_tune_cpucache(cachep, limit, batchcount, shared, gfp); + + if (slab_state < FULL) + return ret; + + if ((ret < 0) || !is_root_cache(cachep)) + return ret; + + lockdep_assert_held(&slab_mutex); + for_each_child_cache(c, cachep) { + /* return value determined by the root cache only */ + __do_tune_cpucache(c, limit, batchcount, shared, gfp); + } + + return ret; +} + /* Called with slab_mutex held always */ static int enable_cpucache(struct kmem_cache *cachep, gfp_t gfp) { @@ -3904,6 +3927,14 @@ static int enable_cpucache(struct kmem_cache *cachep= , gfp_t gfp) if (err) goto end; =20 + if (!is_root_cache(cachep)) { + struct kmem_cache *root =3D get_root_cache(cachep); + + limit =3D root->limit; + shared =3D root->shared; + batchcount =3D root->batchcount; + } + /* * The head array serves three purposes: * - create a LIFO ordering, i.e. return objects that are cache-warm diff --git a/mm/slab.h b/mm/slab.h index f190f4fc0286..b9e11038be27 100644 --- a/mm/slab.h +++ b/mm/slab.h @@ -5,6 +5,45 @@ * Internal slab definitions */ =20 +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + +struct kmem_local_cache_info { + /* Valid for child caches. NULL for the root cache itself. */ + struct kmem_cache *root_cache; + union { + /* For root caches */ + struct { + int cache_id; + struct list_head __root_caches_node; + struct list_head children; + /* + * For SLAB_LOCAL_NONSENSITIVE root caches, this points + * to the cache to be used for local non-sensitive + * allocations from processes without ASI enabled. + * + * For root caches with only SLAB_LOCAL_NONSENSITIVE, + * the root cache itself is used as the sensitive cache. + * + * For root caches with both SLAB_LOCAL_NONSENSITIVE and + * SLAB_GLOBAL_NONSENSITIVE, the sensitive cache will be + * a child cache allocated on-demand. + * + * For non-sensiitve kmalloc caches, the sensitive cache + * will just be the corresponding regular kmalloc cache. + */ + struct kmem_cache *sensitive_cache; + }; + + /* For child (process-local) caches */ + struct { + struct mm_struct *mm; + struct list_head children_node; + }; + }; +}; + +#endif + #ifdef CONFIG_SLOB /* * Common fields provided in kmem_cache by all slab allocators @@ -128,8 +167,7 @@ static inline slab_flags_t kmem_cache_flags(unsigned in= t object_size, } #endif =20 -/* This will also include SLAB_LOCAL_NONSENSITIVE in a later patch. */ -#define SLAB_NONSENSITIVE SLAB_GLOBAL_NONSENSITIVE +#define SLAB_NONSENSITIVE (SLAB_GLOBAL_NONSENSITIVE | SLAB_LOCAL_NONSENSIT= IVE) =20 /* Legal flag mask for kmem_cache_create(), for various configurations */ #define SLAB_CORE_FLAGS (SLAB_HWCACHE_ALIGN | SLAB_CACHE_DMA | \ @@ -251,6 +289,99 @@ static inline bool kmem_cache_debug_flags(struct kmem_= cache *s, slab_flags_t fla return false; } =20 +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + +/* List of all root caches. */ +extern struct list_head slab_root_caches; +#define root_caches_node local_cache_info.__root_caches_node + +/* + * Iterate over all child caches of the given root cache. The caller must = hold + * slab_mutex. + */ +#define for_each_child_cache(iter, root) \ + list_for_each_entry(iter, &(root)->local_cache_info.children, \ + local_cache_info.children_node) + +static inline bool is_root_cache(struct kmem_cache *s) +{ + return !s->local_cache_info.root_cache; +} + +static inline bool slab_equal_or_root(struct kmem_cache *s, + struct kmem_cache *p) +{ + return p =3D=3D s || p =3D=3D s->local_cache_info.root_cache; +} + +/* + * We use suffixes to the name in child caches because we can't have caches + * created in the system with the same name. But when we print them + * locally, better refer to them with the base name + */ +static inline const char *cache_name(struct kmem_cache *s) +{ + if (!is_root_cache(s)) + s =3D s->local_cache_info.root_cache; + return s->name; +} + +static inline struct kmem_cache *get_root_cache(struct kmem_cache *s) +{ + if (is_root_cache(s)) + return s; + return s->local_cache_info.root_cache; +} + +static inline +void restore_page_nonsensitive_metadata(struct page *page, + struct kmem_cache *cachep) +{ + if (PageLocalNonSensitive(page)) { + VM_BUG_ON(is_root_cache(cachep)); + page->asi_mm =3D cachep->local_cache_info.mm; + } +} + +void set_nonsensitive_cache_params(struct kmem_cache *s); + +#else /* CONFIG_ADDRESS_SPACE_ISOLATION */ + +#define slab_root_caches slab_caches +#define root_caches_node list + +#define for_each_child_cache(iter, root) \ + for ((void)(iter), (void)(root); 0; ) + +static inline bool is_root_cache(struct kmem_cache *s) +{ + return true; +} + +static inline bool slab_equal_or_root(struct kmem_cache *s, + struct kmem_cache *p) +{ + return s =3D=3D p; +} + +static inline const char *cache_name(struct kmem_cache *s) +{ + return s->name; +} + +static inline struct kmem_cache *get_root_cache(struct kmem_cache *s) +{ + return s; +} + +static inline void restore_page_nonsensitive_metadata(struct page *page, + struct kmem_cache *cachep) +{ } + +static inline void set_nonsensitive_cache_params(struct kmem_cache *s) { } + +#endif /* CONFIG_ADDRESS_SPACE_ISOLATION */ + #ifdef CONFIG_MEMCG_KMEM int memcg_alloc_page_obj_cgroups(struct page *page, struct kmem_cache *s, gfp_t gfp, bool new_page); @@ -449,11 +580,12 @@ static inline struct kmem_cache *cache_from_obj(struc= t kmem_cache *s, void *x) struct kmem_cache *cachep; =20 if (!IS_ENABLED(CONFIG_SLAB_FREELIST_HARDENED) && + !(s->flags & SLAB_LOCAL_NONSENSITIVE) && !kmem_cache_debug_flags(s, SLAB_CONSISTENCY_CHECKS)) return s; =20 cachep =3D virt_to_cache(x); - if (WARN(cachep && cachep !=3D s, + if (WARN(cachep && !slab_equal_or_root(cachep, s), "%s: Wrong slab cache. %s but object is from %s\n", __func__, s->name, cachep->name)) print_tracking(cachep, x); @@ -501,11 +633,24 @@ static inline struct kmem_cache *slab_pre_alloc_hook(= struct kmem_cache *s, if (static_asi_enabled()) { VM_BUG_ON(!(s->flags & SLAB_GLOBAL_NONSENSITIVE) && (flags & __GFP_GLOBAL_NONSENSITIVE)); + VM_BUG_ON(!(s->flags & SLAB_LOCAL_NONSENSITIVE) && + (flags & __GFP_LOCAL_NONSENSITIVE)); + VM_BUG_ON((s->flags & SLAB_NONSENSITIVE) =3D=3D SLAB_NONSENSITIVE && + !(flags & (__GFP_LOCAL_NONSENSITIVE | + __GFP_GLOBAL_NONSENSITIVE))); } =20 if (should_failslab(s, flags)) return NULL; =20 + if (static_asi_enabled() && + (!(flags & __GFP_GLOBAL_NONSENSITIVE) && + (s->flags & SLAB_LOCAL_NONSENSITIVE))) { + s =3D get_local_kmem_cache(s, current->mm, flags); + if (!s) + return NULL; + } + if (!memcg_slab_pre_alloc_hook(s, objcgp, size, flags)) return NULL; =20 diff --git a/mm/slab_common.c b/mm/slab_common.c index 72dee2494bf8..b486b72d6344 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -42,6 +42,13 @@ static void slab_caches_to_rcu_destroy_workfn(struct wor= k_struct *work); static DECLARE_WORK(slab_caches_to_rcu_destroy_work, slab_caches_to_rcu_destroy_workfn); =20 +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + +static DEFINE_IDA(nonsensitive_cache_ids); +static uint max_num_local_slab_caches =3D 32; + +#endif + /* * Set of flags that will prevent slab merging */ @@ -131,6 +138,69 @@ int __kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_= t flags, size_t nr, return i; } =20 +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + +LIST_HEAD(slab_root_caches); + +static void init_local_cache_info(struct kmem_cache *s, struct kmem_cache = *root) +{ + if (root) { + s->local_cache_info.root_cache =3D root; + list_add(&s->local_cache_info.children_node, + &root->local_cache_info.children); + } else { + s->local_cache_info.cache_id =3D -1; + INIT_LIST_HEAD(&s->local_cache_info.children); + list_add(&s->root_caches_node, &slab_root_caches); + } +} + +static void cleanup_local_cache_info(struct kmem_cache *s) +{ + if (is_root_cache(s)) { + VM_BUG_ON(!list_empty(&s->local_cache_info.children)); + + list_del(&s->root_caches_node); + if (s->local_cache_info.cache_id >=3D 0) + ida_free(&nonsensitive_cache_ids, + s->local_cache_info.cache_id); + } else { + struct mm_struct *mm =3D s->local_cache_info.mm; + struct kmem_cache *root_cache =3D s->local_cache_info.root_cache; + int id =3D root_cache->local_cache_info.cache_id; + + list_del(&s->local_cache_info.children_node); + if (mm) { + struct kmem_cache **local_caches =3D + rcu_dereference_protected(mm->local_slab_caches, + lockdep_is_held(&slab_mutex)); + local_caches[id] =3D NULL; + } + } +} + +void set_nonsensitive_cache_params(struct kmem_cache *s) +{ + if (s->flags & SLAB_GLOBAL_NONSENSITIVE) { + s->allocflags |=3D __GFP_GLOBAL_NONSENSITIVE; + VM_BUG_ON(!is_root_cache(s)); + } else if (s->flags & SLAB_LOCAL_NONSENSITIVE) { + if (is_root_cache(s)) + s->local_cache_info.sensitive_cache =3D s; + else + s->allocflags |=3D __GFP_LOCAL_NONSENSITIVE; + } +} + +#else + +static inline +void init_local_cache_info(struct kmem_cache *s, struct kmem_cache *root) = { } + +static inline void cleanup_local_cache_info(struct kmem_cache *s) { } + +#endif /* CONFIG_ADDRESS_SPACE_ISOLATION */ + /* * Figure out what the alignment of the objects will be given a set of * flags, a user specified alignment and the size of the objects. @@ -168,6 +238,9 @@ int slab_unmergeable(struct kmem_cache *s) if (slab_nomerge || (s->flags & SLAB_NEVER_MERGE)) return 1; =20 + if (!is_root_cache(s)) + return 1; + if (s->ctor) return 1; =20 @@ -202,7 +275,7 @@ struct kmem_cache *find_mergeable(unsigned int size, un= signed int align, if (flags & SLAB_NEVER_MERGE) return NULL; =20 - list_for_each_entry_reverse(s, &slab_caches, list) { + list_for_each_entry_reverse(s, &slab_root_caches, root_caches_node) { if (slab_unmergeable(s)) continue; =20 @@ -254,6 +327,8 @@ static struct kmem_cache *create_cache(const char *name, s->useroffset =3D useroffset; s->usersize =3D usersize; =20 + init_local_cache_info(s, root_cache); + err =3D __kmem_cache_create(s, flags); if (err) goto out_free_cache; @@ -266,6 +341,7 @@ static struct kmem_cache *create_cache(const char *name, return s; =20 out_free_cache: + cleanup_local_cache_info(s); kmem_cache_free(kmem_cache, s); goto out; } @@ -459,6 +535,7 @@ static int shutdown_cache(struct kmem_cache *s) return -EBUSY; =20 list_del(&s->list); + cleanup_local_cache_info(s); =20 if (s->flags & SLAB_TYPESAFE_BY_RCU) { #ifdef SLAB_SUPPORTS_SYSFS @@ -480,6 +557,36 @@ static int shutdown_cache(struct kmem_cache *s) return 0; } =20 +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + +static int shutdown_child_caches(struct kmem_cache *s) +{ + struct kmem_cache *c, *c2; + int r; + + VM_BUG_ON(!is_root_cache(s)); + + lockdep_assert_held(&slab_mutex); + + list_for_each_entry_safe(c, c2, &s->local_cache_info.children, + local_cache_info.children_node) { + r =3D shutdown_cache(c); + if (r) + return r; + } + + return 0; +} + +#else + +static inline int shutdown_child_caches(struct kmem_cache *s) +{ + return 0; +} + +#endif /* CONFIG_ADDRESS_SPACE_ISOLATION */ + void slab_kmem_cache_release(struct kmem_cache *s) { __kmem_cache_release(s); @@ -501,7 +608,10 @@ void kmem_cache_destroy(struct kmem_cache *s) if (s->refcount) goto out_unlock; =20 - err =3D shutdown_cache(s); + err =3D shutdown_child_caches(s); + if (!err) + err =3D shutdown_cache(s); + if (err) { pr_err("%s %s: Slab cache still has objects\n", __func__, s->name); @@ -651,6 +761,8 @@ void __init create_boot_cache(struct kmem_cache *s, con= st char *name, s->useroffset =3D useroffset; s->usersize =3D usersize; =20 + init_local_cache_info(s, NULL); + err =3D __kmem_cache_create(s, flags); =20 if (err) @@ -897,6 +1009,13 @@ new_kmalloc_cache(int idx, enum kmalloc_cache_type ty= pe, slab_flags_t flags) */ if (IS_ENABLED(CONFIG_MEMCG_KMEM) && (type =3D=3D KMALLOC_NORMAL)) caches[type][idx]->refcount =3D -1; + +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + + if (flags & SLAB_NONSENSITIVE) + caches[type][idx]->local_cache_info.sensitive_cache =3D + kmalloc_caches[type][idx]; +#endif } =20 /* @@ -1086,12 +1205,12 @@ static void print_slabinfo_header(struct seq_file *= m) void *slab_start(struct seq_file *m, loff_t *pos) { mutex_lock(&slab_mutex); - return seq_list_start(&slab_caches, *pos); + return seq_list_start(&slab_root_caches, *pos); } =20 void *slab_next(struct seq_file *m, void *p, loff_t *pos) { - return seq_list_next(p, &slab_caches, pos); + return seq_list_next(p, &slab_root_caches, pos); } =20 void slab_stop(struct seq_file *m, void *p) @@ -1099,6 +1218,24 @@ void slab_stop(struct seq_file *m, void *p) mutex_unlock(&slab_mutex); } =20 +static void +accumulate_children_slabinfo(struct kmem_cache *s, struct slabinfo *info) +{ + struct kmem_cache *c; + struct slabinfo sinfo; + + for_each_child_cache(c, s) { + memset(&sinfo, 0, sizeof(sinfo)); + get_slabinfo(c, &sinfo); + + info->active_slabs +=3D sinfo.active_slabs; + info->num_slabs +=3D sinfo.num_slabs; + info->shared_avail +=3D sinfo.shared_avail; + info->active_objs +=3D sinfo.active_objs; + info->num_objs +=3D sinfo.num_objs; + } +} + static void cache_show(struct kmem_cache *s, struct seq_file *m) { struct slabinfo sinfo; @@ -1106,8 +1243,10 @@ static void cache_show(struct kmem_cache *s, struct = seq_file *m) memset(&sinfo, 0, sizeof(sinfo)); get_slabinfo(s, &sinfo); =20 + accumulate_children_slabinfo(s, &sinfo); + seq_printf(m, "%-17s %6lu %6lu %6u %4u %4d", - s->name, sinfo.active_objs, sinfo.num_objs, s->size, + cache_name(s), sinfo.active_objs, sinfo.num_objs, s->size, sinfo.objects_per_slab, (1 << sinfo.cache_order)); =20 seq_printf(m, " : tunables %4u %4u %4u", @@ -1120,9 +1259,9 @@ static void cache_show(struct kmem_cache *s, struct s= eq_file *m) =20 static int slab_show(struct seq_file *m, void *p) { - struct kmem_cache *s =3D list_entry(p, struct kmem_cache, list); + struct kmem_cache *s =3D list_entry(p, struct kmem_cache, root_caches_nod= e); =20 - if (p =3D=3D slab_caches.next) + if (p =3D=3D slab_root_caches.next) print_slabinfo_header(m); cache_show(s, m); return 0; @@ -1148,14 +1287,14 @@ void dump_unreclaimable_slab(void) pr_info("Unreclaimable slab info:\n"); pr_info("Name Used Total\n"); =20 - list_for_each_entry(s, &slab_caches, list) { + list_for_each_entry(s, &slab_root_caches, root_caches_node) { if (s->flags & SLAB_RECLAIM_ACCOUNT) continue; =20 get_slabinfo(s, &sinfo); =20 if (sinfo.num_objs > 0) - pr_info("%-17s %10luKB %10luKB\n", s->name, + pr_info("%-17s %10luKB %10luKB\n", cache_name(s), (sinfo.active_objs * s->size) / 1024, (sinfo.num_objs * s->size) / 1024); } @@ -1361,3 +1500,209 @@ int should_failslab(struct kmem_cache *s, gfp_t gfp= flags) return 0; } ALLOW_ERROR_INJECTION(should_failslab, ERRNO); + +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + +static int resize_local_slab_caches_array(struct mm_struct *mm, gfp_t flag= s) +{ + struct kmem_cache **new_array; + struct kmem_cache **old_array =3D + rcu_dereference_protected(mm->local_slab_caches, + lockdep_is_held(&slab_mutex)); + + new_array =3D kcalloc(max_num_local_slab_caches, + sizeof(struct kmem_cache *), flags); + if (!new_array) + return -ENOMEM; + + if (old_array) + memcpy(new_array, old_array, mm->local_slab_caches_array_size * + sizeof(struct kmem_cache *)); + + rcu_assign_pointer(mm->local_slab_caches, new_array); + smp_store_release(&mm->local_slab_caches_array_size, + max_num_local_slab_caches); + + if (old_array) { + synchronize_rcu(); + kfree(old_array); + } + + return 0; +} + +static int get_or_alloc_cache_id(struct kmem_cache *root_cache, gfp_t flag= s) +{ + int id =3D root_cache->local_cache_info.cache_id; + + if (id >=3D 0) + return id; + + id =3D ida_alloc_max(&nonsensitive_cache_ids, + max_num_local_slab_caches - 1, flags); + if (id =3D=3D -ENOSPC) { + max_num_local_slab_caches *=3D 2; + id =3D ida_alloc_max(&nonsensitive_cache_ids, + max_num_local_slab_caches - 1, flags); + } + + if (id >=3D 0) + root_cache->local_cache_info.cache_id =3D id; + + return id; +} + +static struct kmem_cache *create_local_kmem_cache(struct kmem_cache *root_= cache, + struct mm_struct *mm, + gfp_t flags) +{ + char *name; + struct kmem_cache *s =3D NULL; + slab_flags_t slab_flags =3D root_cache->flags & CACHE_CREATE_MASK; + struct kmem_cache **cache_ptr; + + flags &=3D GFP_RECLAIM_MASK; + + mutex_lock(&slab_mutex); + + if (mm_asi_enabled(mm)) { + struct kmem_cache **caches; + int id =3D get_or_alloc_cache_id(root_cache, flags); + + if (id < 0) + goto out; + + flags |=3D __GFP_ACCOUNT; + + if (mm->local_slab_caches_array_size <=3D id && + resize_local_slab_caches_array(mm, flags) < 0) + goto out; + + caches =3D rcu_dereference_protected(mm->local_slab_caches, + lockdep_is_held(&slab_mutex)); + cache_ptr =3D &caches[id]; + if (*cache_ptr) { + s =3D *cache_ptr; + goto out; + } + + slab_flags &=3D ~SLAB_GLOBAL_NONSENSITIVE; + name =3D kasprintf(flags, "%s(%d:%s)", root_cache->name, + task_pid_nr(mm->owner), mm->owner->comm); + if (!name) + goto out; + + } else { + cache_ptr =3D &root_cache->local_cache_info.sensitive_cache; + if (*cache_ptr) { + s =3D *cache_ptr; + goto out; + } + + slab_flags &=3D ~SLAB_NONSENSITIVE; + name =3D kasprintf(flags, "%s(sensitive)", root_cache->name); + if (!name) + goto out; + } + + s =3D create_cache(name, + root_cache->object_size, + root_cache->align, + slab_flags, + root_cache->useroffset, root_cache->usersize, + root_cache->ctor, root_cache); + if (IS_ERR(s)) { + pr_info("Unable to create child kmem cache %s. Err %ld", + name, PTR_ERR(s)); + kfree(name); + s =3D NULL; + goto out; + } + + if (mm_asi_enabled(mm)) + s->local_cache_info.mm =3D mm; + + smp_store_release(cache_ptr, s); +out: + mutex_unlock(&slab_mutex); + + return s; +} + +struct kmem_cache *get_local_kmem_cache(struct kmem_cache *s, + struct mm_struct *mm, gfp_t flags) +{ + struct kmem_cache *local_cache =3D NULL; + + if (!(s->flags & SLAB_LOCAL_NONSENSITIVE) || !is_root_cache(s)) + return s; + + if (mm_asi_enabled(mm)) { + struct kmem_cache **caches; + int id =3D READ_ONCE(s->local_cache_info.cache_id); + uint array_size =3D smp_load_acquire( + &mm->local_slab_caches_array_size); + + if (id >=3D 0 && array_size > id) { + rcu_read_lock(); + caches =3D rcu_dereference(mm->local_slab_caches); + local_cache =3D smp_load_acquire(&caches[id]); + rcu_read_unlock(); + } + } else { + local_cache =3D + smp_load_acquire(&s->local_cache_info.sensitive_cache); + } + + if (!local_cache) + local_cache =3D create_local_kmem_cache(s, mm, flags); + + return local_cache; +} + +void free_local_slab_caches(struct mm_struct *mm) +{ + uint i; + struct kmem_cache **caches =3D + rcu_dereference_protected(mm->local_slab_caches, + atomic_read(&mm->mm_count) =3D=3D 0); + + if (!caches) + return; + + cpus_read_lock(); + mutex_lock(&slab_mutex); + + for (i =3D 0; i < mm->local_slab_caches_array_size; i++) + if (caches[i]) + WARN_ON(shutdown_cache(caches[i])); + + mutex_unlock(&slab_mutex); + cpus_read_unlock(); + + kfree(caches); +} + +int kmem_cache_precreate_local(struct kmem_cache *s) +{ + VM_BUG_ON(!is_root_cache(s)); + VM_BUG_ON(!in_task()); + might_sleep(); + + return get_local_kmem_cache(s, current->mm, GFP_KERNEL) ? 0 : -ENOMEM; +} +EXPORT_SYMBOL(kmem_cache_precreate_local); + +int kmem_cache_precreate_local_kmalloc(size_t size, gfp_t flags) +{ + struct kmem_cache *s =3D kmalloc_slab(size, + flags | __GFP_LOCAL_NONSENSITIVE); + + if (ZERO_OR_NULL_PTR(s)) + return 0; + + return kmem_cache_precreate_local(s); +} +EXPORT_SYMBOL(kmem_cache_precreate_local_kmalloc); + +#endif --=20 2.35.1.473.g83b2b277ed-goog From nobody Fri Jun 26 13:44:01 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3FD3AC433EF for ; Wed, 23 Feb 2022 05:26:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238364AbiBWF04 (ORCPT ); Wed, 23 Feb 2022 00:26:56 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58724 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238222AbiBWF0E (ORCPT ); Wed, 23 Feb 2022 00:26:04 -0500 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 613406E2AA for ; Tue, 22 Feb 2022 21:24:53 -0800 (PST) Received: by mail-yb1-xb4a.google.com with SMTP id i6-20020a255406000000b006241e97e420so18864439ybb.5 for ; Tue, 22 Feb 2022 21:24:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=qiOY3I7MTzDOyydGuOsoL3kibVHTWqt/T8NonRDSdjI=; b=gHD/ZzZ2afDsPzQ43uzTfHDeNudIA9BOq9xrr9L14+jP+dH61iVLZ9hzKNk6oCuBOI frJ4Y/pMW9lHMFuHIDT5LntquphVA/hdUftBkeKzRimB/OZsUvqrqt+OTldNL/5M2dvF +PIjZDytqBI9qNUICzwPtZ9WZreG3la5anScSUKXP0fumh99WG+FQaCeyfbwMohmDN9Z TqQY1Ne53dTgZlticBLJ/nLFoMZJ6npZslQ1FhKLNnIG+gBpH+SHHrgjBmREZgI8pcKn GtFQ6xyENvmtRWa1WOq2159X/570g6HXWtqzHRNADNoxBQZmRvsc5EkR5xI3zImo/8Xk CC5Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=qiOY3I7MTzDOyydGuOsoL3kibVHTWqt/T8NonRDSdjI=; b=tCCmwyiXaNGBUU0y083xTRbFoM+zVjiT/Vl6uxenVEShbrkdNhdfSXJ6cOobxf1Mdt Un5QqByfYzfmDj0uGajMYVn+1a7l83uk8byIpODfSxqqGS1r0rtargF1n4KK+1z56XvT tKyhsXPjSCvdL1q9jAseWVQq0Ksd7P1hshGB7o/UKtydQZAqL7/+Ykxst9qGll2nqA88 1IyhT7Cge7Rsxko448pjFoL3D9OV+ugbgr+pssolSEOuFtrSU5JwxZ5n+wMmU5hqrV61 KvX6VxX0RWi1N8rCAe1y0vn1xtU+hMHofmtQch6kYwgJf4wBKqDiZQgYTGeJ7TIsbKtN y6RQ== X-Gm-Message-State: AOAM532wm9DR7WNVRk/mp9qrwdMtyYjpW4cvVqjN4XLIcmfMyQaV5JpF WOCd+QrXm0dEW0nr25SU0JIRWBVYxuCbIR0jslxz8Z6cgeqXJX5nf/pZPfIqIgsU4Ny9lrfWUSs HJ000XN8Pj1JlCcgHh5YYUXCuIUadlJDYce5J5yArLApDKzI6MHsv9eAvi86UdClva+H5A8m1 X-Google-Smtp-Source: ABdhPJz0RUH+JeSNvxcFS2t+PaXJsk2Yx25dJd/bxX+okavoC9VlrjteMlukvkj4zkv9eNKI+iSQhOk6jOMd X-Received: from js-desktop.svl.corp.google.com ([2620:15c:2cd:202:ccbe:5d15:e2e6:322]) (user=junaids job=sendgmr) by 2002:a25:bad2:0:b0:620:fe28:ff53 with SMTP id a18-20020a25bad2000000b00620fe28ff53mr26732357ybk.340.1645593881600; Tue, 22 Feb 2022 21:24:41 -0800 (PST) Date: Tue, 22 Feb 2022 21:22:01 -0800 In-Reply-To: <20220223052223.1202152-1-junaids@google.com> Message-Id: <20220223052223.1202152-26-junaids@google.com> Mime-Version: 1.0 References: <20220223052223.1202152-1-junaids@google.com> X-Mailer: git-send-email 2.35.1.473.g83b2b277ed-goog Subject: [RFC PATCH 25/47] mm: asi: Avoid warning from NMI userspace accesses in ASI context From: Junaid Shahid To: linux-kernel@vger.kernel.org Cc: kvm@vger.kernel.org, pbonzini@redhat.com, jmattson@google.com, pjt@google.com, oweisse@google.com, alexandre.chartre@oracle.com, rppt@linux.ibm.com, dave.hansen@linux.intel.com, peterz@infradead.org, tglx@linutronix.de, luto@kernel.org, linux-mm@kvack.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" nmi_uaccess_okay() emits a warning if current CR3 !=3D mm->pgd. Limit the warning to only when ASI is not active. Signed-off-by: Junaid Shahid --- arch/x86/mm/tlb.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 25bee959d1d3..628f1cd904ac 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -1292,7 +1292,8 @@ bool nmi_uaccess_okay(void) if (loaded_mm !=3D current_mm) return false; =20 - VM_WARN_ON_ONCE(current_mm->pgd !=3D __va(read_cr3_pa())); + VM_WARN_ON_ONCE(current_mm->pgd !=3D __va(read_cr3_pa()) && + !is_asi_active()); =20 return true; } --=20 2.35.1.473.g83b2b277ed-goog From nobody Fri Jun 26 13:44:01 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 86705C433FE for ; Wed, 23 Feb 2022 05:27:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238384AbiBWF1h (ORCPT ); Wed, 23 Feb 2022 00:27:37 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57682 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238240AbiBWF0E (ORCPT ); Wed, 23 Feb 2022 00:26:04 -0500 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 193576E2B5 for ; Tue, 22 Feb 2022 21:24:54 -0800 (PST) Received: by mail-yb1-xb49.google.com with SMTP id b64-20020a256743000000b0061e169a5f19so26555546ybc.11 for ; Tue, 22 Feb 2022 21:24:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=mHknd275MnGb/PaPBY3xJHVlcaSr+gb2g5yi23zkCMQ=; b=L60sUeBO3icqMJb7NMSmlVXjmIABeVNLCCtZzlOUINB+XKCgvwoL14gcA1/NI1wk1X /ltNckj3/3p7zfm1t1rOui84yRyyoV+XIZUZcTPfV3dYES6ZKCv3iaEJAriiM57Y+pJ2 8u/ZPuM69uvOpnXWpYYvEAAwrxzB6ihF2OY0/HqTUANIyhOPp6jmVN945B9gOsa+htR7 ghj+eGPzbx50ZpS/8LrSXhCGcEICsrhvK12TqRAuRKoUTOubcXhnkqoaG145HMstky5e ZFHHetF4BUFaO5ULQYQ3U7cD5n4sgmCCalGk7gwPkRzwbBni9d6kLYQ28QmTUA8pH14D Kjrg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=mHknd275MnGb/PaPBY3xJHVlcaSr+gb2g5yi23zkCMQ=; b=yawOAZztp5HIFXRF/H9dZi9Jskz4mV6ENmXxc8rSulWGuU5lRc2auOjdDdpkuE6ruv AQVSC2MpFEpvyzGHNP21z7LSHNoHIvJRTBNJYhulUtAmvdG1FRu4QT9EjqzVw0zoDcwG 1TI8XWiCIqSXldrA0xZCc8Ll4ujwapkyPfT9hi74dfohc939RcCfzbYEfPjYhLZ67Pr6 9dCKgL1AfZ9w1Kq0XkMp9Zd0zWg7eTOfJHfPocz8RMi6c+0G4bc8weZ8ACQdbnXweHZn ovHo5c7j50vXQpexvH744HHLJx9j3HwtYituIvD6NmOrGPrJTArryzT88o5W6bITWQBB mBpQ== X-Gm-Message-State: AOAM533FFh/ND6ZbA/qdFqzFrACn2Di4vthQFbbX1Q3EKO+Bao59/sgP H1GPqyUOqLv7BIUj0Rzs3wtvAB6/N1UhJD+H1IebmT6j2s8aYL+f2QmrUuShrlUxK5hp+SasRGq 0Flyi9dC+mTsEaXdSBTtAVhMaEA26cInvrXHH303NYOzz1X5OytZpax9O1s2j39TE3djLD6Ug X-Google-Smtp-Source: ABdhPJw5GHEn07Q8rEslG5JDYumsMKlC/t9Gjkx79Rz/pi8IkAfQS2IAOFNhhCDkd7GvnzpI+G+qB+zNYrIt X-Received: from js-desktop.svl.corp.google.com ([2620:15c:2cd:202:ccbe:5d15:e2e6:322]) (user=junaids job=sendgmr) by 2002:a81:8414:0:b0:2d0:fdd8:f7e2 with SMTP id u20-20020a818414000000b002d0fdd8f7e2mr27082928ywf.156.1645593883815; Tue, 22 Feb 2022 21:24:43 -0800 (PST) Date: Tue, 22 Feb 2022 21:22:02 -0800 In-Reply-To: <20220223052223.1202152-1-junaids@google.com> Message-Id: <20220223052223.1202152-27-junaids@google.com> Mime-Version: 1.0 References: <20220223052223.1202152-1-junaids@google.com> X-Mailer: git-send-email 2.35.1.473.g83b2b277ed-goog Subject: [RFC PATCH 26/47] mm: asi: Use separate PCIDs for restricted address spaces From: Junaid Shahid To: linux-kernel@vger.kernel.org Cc: kvm@vger.kernel.org, pbonzini@redhat.com, jmattson@google.com, pjt@google.com, oweisse@google.com, alexandre.chartre@oracle.com, rppt@linux.ibm.com, dave.hansen@linux.intel.com, peterz@infradead.org, tglx@linutronix.de, luto@kernel.org, linux-mm@kvack.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Each restricted address space is assigned a separate PCID. Since currently only one ASI instance per-class exists for a given process, the PCID is just derived from the class index. This commit only sets the appropriate PCID when switching CR3, but does not set the NOFLUSH bit. That will be done by later patches. Signed-off-by: Junaid Shahid --- arch/x86/include/asm/asi.h | 3 ++- arch/x86/include/asm/tlbflush.h | 3 +++ arch/x86/mm/asi.c | 6 +++-- arch/x86/mm/tlb.c | 45 ++++++++++++++++++++++++++++++--- 4 files changed, 50 insertions(+), 7 deletions(-) diff --git a/arch/x86/include/asm/asi.h b/arch/x86/include/asm/asi.h index 062ccac07fd9..aaa0d0bdbf59 100644 --- a/arch/x86/include/asm/asi.h +++ b/arch/x86/include/asm/asi.h @@ -40,7 +40,8 @@ struct asi { pgd_t *pgd; struct asi_class *class; struct mm_struct *mm; - int64_t asi_ref_count; + u16 pcid_index; + int64_t asi_ref_count; }; =20 DECLARE_PER_CPU_ALIGNED(struct asi_state, asi_cpu_state); diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflus= h.h index 3c43ad46c14a..f9ec5e67e361 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -260,6 +260,9 @@ static inline void arch_tlbbatch_add_mm(struct arch_tlb= flush_unmap_batch *batch, extern void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch); =20 unsigned long build_cr3(pgd_t *pgd, u16 asid); +unsigned long build_cr3_pcid(pgd_t *pgd, u16 pcid, bool noflush); + +u16 asi_pcid(struct asi *asi, u16 asid); =20 #endif /* !MODULE */ =20 diff --git a/arch/x86/mm/asi.c b/arch/x86/mm/asi.c index 6b9a0f5ab391..dbfea3dc4bb1 100644 --- a/arch/x86/mm/asi.c +++ b/arch/x86/mm/asi.c @@ -335,6 +335,7 @@ int asi_init(struct mm_struct *mm, int asi_index, struc= t asi **out_asi) =20 asi->class =3D &asi_class[asi_index]; asi->mm =3D mm; + asi->pcid_index =3D asi_index; =20 if (asi->class->flags & ASI_MAP_STANDARD_NONSENSITIVE) { uint i; @@ -386,6 +387,7 @@ EXPORT_SYMBOL_GPL(asi_destroy); void __asi_enter(void) { u64 asi_cr3; + u16 pcid; struct asi *target =3D this_cpu_read(asi_cpu_state.target_asi); =20 VM_BUG_ON(preemptible()); @@ -399,8 +401,8 @@ void __asi_enter(void) =20 this_cpu_write(asi_cpu_state.curr_asi, target); =20 - asi_cr3 =3D build_cr3(target->pgd, - this_cpu_read(cpu_tlbstate.loaded_mm_asid)); + pcid =3D asi_pcid(target, this_cpu_read(cpu_tlbstate.loaded_mm_asid)); + asi_cr3 =3D build_cr3_pcid(target->pgd, pcid, false); write_cr3(asi_cr3); =20 if (target->class->ops.post_asi_enter) diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 628f1cd904ac..312b9c185a55 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -97,7 +97,12 @@ # define PTI_CONSUMED_PCID_BITS 0 #endif =20 -#define CR3_AVAIL_PCID_BITS (X86_CR3_PCID_BITS - PTI_CONSUMED_PCID_BITS) +#define ASI_CONSUMED_PCID_BITS ASI_MAX_NUM_ORDER +#define ASI_PCID_BITS_SHIFT CR3_AVAIL_PCID_BITS +#define CR3_AVAIL_PCID_BITS (X86_CR3_PCID_BITS - PTI_CONSUMED_PCID_BITS - \ + ASI_CONSUMED_PCID_BITS) + +static_assert(TLB_NR_DYN_ASIDS < BIT(CR3_AVAIL_PCID_BITS)); =20 /* * ASIDs are zero-based: 0->MAX_AVAIL_ASID are valid. -1 below to account @@ -154,6 +159,34 @@ static inline u16 user_pcid(u16 asid) return ret; } =20 +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + +u16 asi_pcid(struct asi *asi, u16 asid) +{ + return kern_pcid(asid) | (asi->pcid_index << ASI_PCID_BITS_SHIFT); +} + +#else /* CONFIG_ADDRESS_SPACE_ISOLATION */ + +u16 asi_pcid(struct asi *asi, u16 asid) +{ + return kern_pcid(asid); +} + +#endif /* CONFIG_ADDRESS_SPACE_ISOLATION */ + +unsigned long build_cr3_pcid(pgd_t *pgd, u16 pcid, bool noflush) +{ + u64 noflush_bit =3D 0; + + if (!static_cpu_has(X86_FEATURE_PCID)) + pcid =3D 0; + else if (noflush) + noflush_bit =3D CR3_NOFLUSH; + + return __sme_pa(pgd) | pcid | noflush_bit; +} + inline unsigned long build_cr3(pgd_t *pgd, u16 asid) { if (static_cpu_has(X86_FEATURE_PCID)) { @@ -1078,13 +1111,17 @@ unsigned long __get_current_cr3_fast(void) pgd_t *pgd; u16 asid =3D this_cpu_read(cpu_tlbstate.loaded_mm_asid); struct asi *asi =3D asi_get_current(); + u16 pcid; =20 - if (asi) + if (asi) { pgd =3D asi_pgd(asi); - else + pcid =3D asi_pcid(asi, asid); + } else { pgd =3D this_cpu_read(cpu_tlbstate.loaded_mm)->pgd; + pcid =3D kern_pcid(asid); + } =20 - cr3 =3D build_cr3(pgd, asid); + cr3 =3D build_cr3_pcid(pgd, pcid, false); =20 /* For now, be very restrictive about when this can be called. */ VM_WARN_ON(in_nmi() || preemptible()); --=20 2.35.1.473.g83b2b277ed-goog From nobody Fri Jun 26 13:44:01 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 170BCC433FE for ; Wed, 23 Feb 2022 05:27:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238358AbiBWF1b (ORCPT ); Wed, 23 Feb 2022 00:27:31 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57910 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238289AbiBWF0T (ORCPT ); Wed, 23 Feb 2022 00:26:19 -0500 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3A5AC6E36A for ; Tue, 22 Feb 2022 21:24:57 -0800 (PST) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-2d6994a6942so160471867b3.17 for ; Tue, 22 Feb 2022 21:24:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=uZNDncAP+HAcrLlxbbBV0jFwf5VHCLxBgZuEl9jtIsw=; b=r7+qr6EGJVpdhQWG8PE8sNYbh0H6kTLpdH6ctOr0wPyBw/Z56OW3u31LYarb4QOnRX rF4lg3l0IVEqPlgJWaO9d3dWxAEdyJe1HcTUm1KWff1m3xuyDOy9VVB9sTaWZhhffwst nktPdcdn5pMY1JLUsndyvZTdi+apZc+M8CANZ+FleFcjiBh1KMmN3V5cV4UkdqGp6dj7 eto63Nt1iYZsRhYWTV9r7EgsApuw+VuQRiPkO/uEZytDeujtWN47qFOJnrbnHuYu8/o4 722Zq0PbXoAaWsxp0efHDuBM6HhxyWCGqP/XygpW0eZmFRIaDvse+6/DxrUVYKJYRCUo G3aA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=uZNDncAP+HAcrLlxbbBV0jFwf5VHCLxBgZuEl9jtIsw=; b=cIU2wgfX6pmjqPYZxIDMT11YRSl3QuIj8NOJAli42QPlOB6wlBdWXKxJvdHydZUUMz nmf1qzithnudyb4+YaRUYkxRk6WA5ZgXesE9aJVASxXzvBxU3I0hkxwjDgULRFs3FzFF UsEk4DtWaOctNKi9CwCNRaATSxxkSxlWN79GKoNn5lSxkK1sMxMoXQmU3+Dsze5v2owU X5ix6cIxjH78mhTQlTgDDW0rllPQZCetTnKHXoCIHdf1lBPOeIaDb+AJqGQFXHTOJvCo w2rChE4Z7o2/JX10nxcTg+tGe9jQZgNbIqBY7BZou5wn2fk1CBbW1yINsG+v98rjsXgb mknQ== X-Gm-Message-State: AOAM5329j3peQiLrkv27hDtoswCic4qHWj5u61YpPiJ1Vvn3Toed/3Pt DFdMLJO3KGxZ60HPZQzRfcRruR3oEhrORh9ZbFtEFzTUpRVB8eqq+1YJ+Gqf17b2BzKDHhqmkSi blVDkuJkwlxEPw6oNrxyD8u1hWxYq5RhWEvvMAQ4Go+C1cHUOBtSTSHtrAIbqF/7cptXPqsYr X-Google-Smtp-Source: ABdhPJyGbSHeEEC3fevrTXJ9nvduL9Idz1OuetENZ+6YGEpuvdg8wlXwAyrhsHfA5UVeantdGlz2RTX/R7Xv X-Received: from js-desktop.svl.corp.google.com ([2620:15c:2cd:202:ccbe:5d15:e2e6:322]) (user=junaids job=sendgmr) by 2002:a25:6993:0:b0:624:55af:336c with SMTP id e141-20020a256993000000b0062455af336cmr19351145ybc.412.1645593885940; Tue, 22 Feb 2022 21:24:45 -0800 (PST) Date: Tue, 22 Feb 2022 21:22:03 -0800 In-Reply-To: <20220223052223.1202152-1-junaids@google.com> Message-Id: <20220223052223.1202152-28-junaids@google.com> Mime-Version: 1.0 References: <20220223052223.1202152-1-junaids@google.com> X-Mailer: git-send-email 2.35.1.473.g83b2b277ed-goog Subject: [RFC PATCH 27/47] mm: asi: Avoid TLB flushes during ASI CR3 switches when possible From: Junaid Shahid To: linux-kernel@vger.kernel.org Cc: kvm@vger.kernel.org, pbonzini@redhat.com, jmattson@google.com, pjt@google.com, oweisse@google.com, alexandre.chartre@oracle.com, rppt@linux.ibm.com, dave.hansen@linux.intel.com, peterz@infradead.org, tglx@linutronix.de, luto@kernel.org, linux-mm@kvack.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The TLB flush functions are modified to flush the ASI PCIDs in addition to the unrestricted kernel PCID and the KPTI PCID. Some tracking is also added to figure out when the TLB state for ASI PCIDs is out-of-date (e.g. due to lack of INVPCID support), and ASI Enter/Exit use this information to skip a TLB flush during the CR3 switch when the TLB is already up-to-date. Signed-off-by: Junaid Shahid --- arch/x86/include/asm/asi.h | 11 ++- arch/x86/include/asm/tlbflush.h | 47 ++++++++++ arch/x86/mm/asi.c | 38 +++++++- arch/x86/mm/tlb.c | 152 ++++++++++++++++++++++++++++++-- 4 files changed, 234 insertions(+), 14 deletions(-) diff --git a/arch/x86/include/asm/asi.h b/arch/x86/include/asm/asi.h index aaa0d0bdbf59..1a77917c79c7 100644 --- a/arch/x86/include/asm/asi.h +++ b/arch/x86/include/asm/asi.h @@ -126,11 +126,18 @@ static inline void asi_intr_exit(void) if (static_cpu_has(X86_FEATURE_ASI)) { barrier(); =20 - if (--current->thread.intr_nest_depth =3D=3D 0) + if (--current->thread.intr_nest_depth =3D=3D 0) { + barrier(); __asi_enter(); + } } } =20 +static inline int asi_intr_nest_depth(void) +{ + return current->thread.intr_nest_depth; +} + #define INIT_MM_ASI(init_mm) \ .asi =3D { \ [0] =3D { \ @@ -150,6 +157,8 @@ static inline void asi_intr_enter(void) { } =20 static inline void asi_intr_exit(void) { } =20 +static inline int asi_intr_nest_depth(void) { return 0; } + static inline void asi_init_thread_state(struct thread_struct *thread) { } =20 static inline pgd_t *asi_pgd(struct asi *asi) { return NULL; } diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflus= h.h index f9ec5e67e361..295bebdb4395 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -12,6 +12,7 @@ #include #include #include +#include =20 void __flush_tlb_all(void); =20 @@ -59,9 +60,20 @@ static inline void cr4_clear_bits(unsigned long mask) */ #define TLB_NR_DYN_ASIDS 6 =20 +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + +struct asi_tlb_context { + bool flush_pending; +}; + +#endif + struct tlb_context { u64 ctx_id; u64 tlb_gen; +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + struct asi_tlb_context asi_context[ASI_MAX_NUM]; +#endif }; =20 struct tlb_state { @@ -100,6 +112,10 @@ struct tlb_state { */ bool invalidate_other; =20 +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + /* If set, ASI Exit needs to do a TLB flush during the CR3 switch */ + bool kern_pcid_needs_flush; +#endif /* * Mask that contains TLB_NR_DYN_ASIDS+1 bits to indicate * the corresponding user PCID needs a flush next time we @@ -262,8 +278,39 @@ extern void arch_tlbbatch_flush(struct arch_tlbflush_u= nmap_batch *batch); unsigned long build_cr3(pgd_t *pgd, u16 asid); unsigned long build_cr3_pcid(pgd_t *pgd, u16 pcid, bool noflush); =20 +u16 kern_pcid(u16 asid); u16 asi_pcid(struct asi *asi, u16 asid); =20 +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + +static inline bool *__asi_tlb_flush_pending(struct asi *asi) +{ + struct tlb_state *tlb_state; + struct tlb_context *tlb_context; + + tlb_state =3D this_cpu_ptr(&cpu_tlbstate); + tlb_context =3D &tlb_state->ctxs[tlb_state->loaded_mm_asid]; + return &tlb_context->asi_context[asi->pcid_index].flush_pending; +} + +static inline bool asi_get_and_clear_tlb_flush_pending(struct asi *asi) +{ + bool *tlb_flush_pending_ptr =3D __asi_tlb_flush_pending(asi); + bool tlb_flush_pending =3D READ_ONCE(*tlb_flush_pending_ptr); + + if (tlb_flush_pending) + WRITE_ONCE(*tlb_flush_pending_ptr, false); + + return tlb_flush_pending; +} + +static inline void asi_clear_pending_tlb_flush(struct asi *asi) +{ + WRITE_ONCE(*__asi_tlb_flush_pending(asi), false); +} + +#endif /* CONFIG_ADDRESS_SPACE_ISOLATION */ + #endif /* !MODULE */ =20 #endif /* _ASM_X86_TLBFLUSH_H */ diff --git a/arch/x86/mm/asi.c b/arch/x86/mm/asi.c index dbfea3dc4bb1..17b8e6e60312 100644 --- a/arch/x86/mm/asi.c +++ b/arch/x86/mm/asi.c @@ -388,6 +388,7 @@ void __asi_enter(void) { u64 asi_cr3; u16 pcid; + bool need_flush =3D false; struct asi *target =3D this_cpu_read(asi_cpu_state.target_asi); =20 VM_BUG_ON(preemptible()); @@ -401,8 +402,18 @@ void __asi_enter(void) =20 this_cpu_write(asi_cpu_state.curr_asi, target); =20 + if (static_cpu_has(X86_FEATURE_PCID)) + need_flush =3D asi_get_and_clear_tlb_flush_pending(target); + + /* + * It is possible that we may get a TLB flush IPI after + * already reading need_flush, in which case we won't do the + * flush below. However, in that case the interrupt epilog + * will also call __asi_enter(), which will do the flush. + */ + pcid =3D asi_pcid(target, this_cpu_read(cpu_tlbstate.loaded_mm_asid)); - asi_cr3 =3D build_cr3_pcid(target->pgd, pcid, false); + asi_cr3 =3D build_cr3_pcid(target->pgd, pcid, !need_flush); write_cr3(asi_cr3); =20 if (target->class->ops.post_asi_enter) @@ -437,12 +448,31 @@ void asi_exit(void) asi =3D this_cpu_read(asi_cpu_state.curr_asi); =20 if (asi) { + bool need_flush =3D false; + if (asi->class->ops.pre_asi_exit) asi->class->ops.pre_asi_exit(); =20 - unrestricted_cr3 =3D - build_cr3(this_cpu_read(cpu_tlbstate.loaded_mm)->pgd, - this_cpu_read(cpu_tlbstate.loaded_mm_asid)); + if (static_cpu_has(X86_FEATURE_PCID) && + !static_cpu_has(X86_FEATURE_INVPCID_SINGLE)) { + need_flush =3D this_cpu_read( + cpu_tlbstate.kern_pcid_needs_flush); + this_cpu_write(cpu_tlbstate.kern_pcid_needs_flush, + false); + } + + /* + * It is possible that we may get a TLB flush IPI after + * already reading need_flush. However, in that case the IPI + * will not set flush_pending for the unrestricted address + * space, as that is done by flush_tlb_one_user() only if + * asi_intr_nest_depth() is 0. + */ + + unrestricted_cr3 =3D build_cr3_pcid( + this_cpu_read(cpu_tlbstate.loaded_mm)->pgd, + kern_pcid(this_cpu_read(cpu_tlbstate.loaded_mm_asid)), + !need_flush); =20 write_cr3(unrestricted_cr3); this_cpu_write(asi_cpu_state.curr_asi, NULL); diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 312b9c185a55..5c9681df3a16 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -114,7 +114,7 @@ static_assert(TLB_NR_DYN_ASIDS < BIT(CR3_AVAIL_PCID_BIT= S)); /* * Given @asid, compute kPCID */ -static inline u16 kern_pcid(u16 asid) +inline u16 kern_pcid(u16 asid) { VM_WARN_ON_ONCE(asid > MAX_ASID_AVAILABLE); =20 @@ -166,6 +166,60 @@ u16 asi_pcid(struct asi *asi, u16 asid) return kern_pcid(asid) | (asi->pcid_index << ASI_PCID_BITS_SHIFT); } =20 +static void invalidate_kern_pcid(void) +{ + this_cpu_write(cpu_tlbstate.kern_pcid_needs_flush, true); +} + +static void invalidate_asi_pcid(struct asi *asi, u16 asid) +{ + uint i; + struct asi_tlb_context *asi_tlb_context; + + if (!static_cpu_has(X86_FEATURE_ASI) || + !static_cpu_has(X86_FEATURE_PCID)) + return; + + asi_tlb_context =3D this_cpu_ptr(cpu_tlbstate.ctxs[asid].asi_context); + + if (asi) + asi_tlb_context[asi->pcid_index].flush_pending =3D true; + else + for (i =3D 1; i < ASI_MAX_NUM; i++) + asi_tlb_context[i].flush_pending =3D true; +} + +static void flush_asi_pcid(struct asi *asi) +{ + u16 asid =3D this_cpu_read(cpu_tlbstate.loaded_mm_asid); + /* + * The flag should be cleared before the INVPCID, to avoid clearing it + * in case an interrupt/exception sets it again after the INVPCID. + */ + asi_clear_pending_tlb_flush(asi); + invpcid_flush_single_context(asi_pcid(asi, asid)); +} + +static void __flush_tlb_one_asi(struct asi *asi, u16 asid, size_t addr) +{ + if (!static_cpu_has(X86_FEATURE_ASI)) + return; + + if (!static_cpu_has(X86_FEATURE_INVPCID_SINGLE)) { + invalidate_asi_pcid(asi, asid); + } else if (asi) { + invpcid_flush_one(asi_pcid(asi, asid), addr); + } else { + uint i; + struct mm_struct *mm =3D this_cpu_read(cpu_tlbstate.loaded_mm); + + for (i =3D 1; i < ASI_MAX_NUM; i++) + if (mm->asi[i].pgd) + invpcid_flush_one(asi_pcid(&mm->asi[i], asid), + addr); + } +} + #else /* CONFIG_ADDRESS_SPACE_ISOLATION */ =20 u16 asi_pcid(struct asi *asi, u16 asid) @@ -173,6 +227,11 @@ u16 asi_pcid(struct asi *asi, u16 asid) return kern_pcid(asid); } =20 +static inline void invalidate_kern_pcid(void) { } +static inline void invalidate_asi_pcid(struct asi *asi, u16 asid) { } +static inline void flush_asi_pcid(struct asi *asi) { } +static inline void __flush_tlb_one_asi(struct asi *asi, u16 asid, size_t a= ddr) { } + #endif /* CONFIG_ADDRESS_SPACE_ISOLATION */ =20 unsigned long build_cr3_pcid(pgd_t *pgd, u16 pcid, bool noflush) @@ -223,7 +282,8 @@ static void clear_asid_other(void) * This is only expected to be set if we have disabled * kernel _PAGE_GLOBAL pages. */ - if (!static_cpu_has(X86_FEATURE_PTI)) { + if (!static_cpu_has(X86_FEATURE_PTI) && + !cpu_feature_enabled(X86_FEATURE_ASI)) { WARN_ON_ONCE(1); return; } @@ -313,6 +373,7 @@ static void load_new_mm_cr3(pgd_t *pgdir, u16 new_asid,= bool need_flush) =20 if (need_flush) { invalidate_user_asid(new_asid); + invalidate_asi_pcid(NULL, new_asid); new_mm_cr3 =3D build_cr3(pgdir, new_asid); } else { new_mm_cr3 =3D build_cr3_noflush(pgdir, new_asid); @@ -741,11 +802,17 @@ void initialize_tlbstate_and_flush(void) this_cpu_write(cpu_tlbstate.next_asid, 1); this_cpu_write(cpu_tlbstate.ctxs[0].ctx_id, mm->context.ctx_id); this_cpu_write(cpu_tlbstate.ctxs[0].tlb_gen, tlb_gen); + invalidate_asi_pcid(NULL, 0); =20 for (i =3D 1; i < TLB_NR_DYN_ASIDS; i++) this_cpu_write(cpu_tlbstate.ctxs[i].ctx_id, 0); } =20 +static inline void invlpg(unsigned long addr) +{ + asm volatile("invlpg (%0)" ::"r"(addr) : "memory"); +} + /* * flush_tlb_func()'s memory ordering requirement is that any * TLB fills that happen after we flush the TLB are ordered after we @@ -967,7 +1034,8 @@ void flush_tlb_multi(const struct cpumask *cpumask, * least 95%) of allocations, and is small enough that we are * confident it will not cause too much overhead. Each single * flush is about 100 ns, so this caps the maximum overhead at - * _about_ 3,000 ns. + * _about_ 3,000 ns (plus upto an additional ~3000 ns for each + * ASI instance, or for KPTI). * * This is in units of pages. */ @@ -1157,7 +1225,8 @@ void flush_tlb_one_kernel(unsigned long addr) */ flush_tlb_one_user(addr); =20 - if (!static_cpu_has(X86_FEATURE_PTI)) + if (!static_cpu_has(X86_FEATURE_PTI) && + !cpu_feature_enabled(X86_FEATURE_ASI)) return; =20 /* @@ -1174,9 +1243,45 @@ void flush_tlb_one_kernel(unsigned long addr) */ STATIC_NOPV void native_flush_tlb_one_user(unsigned long addr) { - u32 loaded_mm_asid =3D this_cpu_read(cpu_tlbstate.loaded_mm_asid); + u16 loaded_mm_asid; =20 - asm volatile("invlpg (%0)" ::"r" (addr) : "memory"); + if (!static_cpu_has(X86_FEATURE_PCID)) { + invlpg(addr); + return; + } + + loaded_mm_asid =3D this_cpu_read(cpu_tlbstate.loaded_mm_asid); + + /* + * If we don't have INVPCID support, then we do an ASI Exit so that + * the invlpg happens in the unrestricted address space, and we + * invalidate the ASI PCID so that it is flushed at the next ASI Enter. + * + * But if a valid target ASI is set, then an ASI Exit can be ephemeral + * due to interrupts/exceptions/NMIs (except if we are already inside + * one), so we just invalidate both the ASI and the unrestricted kernel + * PCIDs and let the invlpg flush whichever happens to be the current + * address space. This is a bit more wasteful, but this scenario is not + * actually expected to occur with the current usage of ASI, and is + * handled here just for completeness. (If we wanted to optimize this, + * we could manipulate the intr_nest_depth to guarantee that an ASI + * Exit is not ephemeral). + */ + if (!static_cpu_has(X86_FEATURE_INVPCID_SINGLE)) { + if (unlikely(!asi_is_target_unrestricted()) && + asi_intr_nest_depth() =3D=3D 0) + invalidate_kern_pcid(); + else + asi_exit(); + } + + /* Flush the unrestricted kernel address space */ + if (!is_asi_active()) + invlpg(addr); + else + invpcid_flush_one(kern_pcid(loaded_mm_asid), addr); + + __flush_tlb_one_asi(NULL, loaded_mm_asid, addr); =20 if (!static_cpu_has(X86_FEATURE_PTI)) return; @@ -1235,6 +1340,9 @@ STATIC_NOPV void native_flush_tlb_global(void) */ STATIC_NOPV void native_flush_tlb_local(void) { + struct asi *asi; + u16 asid =3D this_cpu_read(cpu_tlbstate.loaded_mm_asid); + /* * Preemption or interrupts must be disabled to protect the access * to the per CPU variable and to prevent being preempted between @@ -1242,10 +1350,36 @@ STATIC_NOPV void native_flush_tlb_local(void) */ WARN_ON_ONCE(preemptible()); =20 - invalidate_user_asid(this_cpu_read(cpu_tlbstate.loaded_mm_asid)); + /* + * If we don't have INVPCID support, then we have to use + * write_cr3(read_cr3()). However, that is not safe when ASI is active, + * as an interrupt/exception/NMI could cause an ASI Exit in the middle + * and change CR3. So we trigger an ASI Exit beforehand. But if a valid + * target ASI is set, then an ASI Exit can also be ephemeral due to + * interrupts (except if we are already inside one), and thus we have to + * fallback to a global TLB flush. + */ + if (!static_cpu_has(X86_FEATURE_INVPCID_SINGLE)) { + if (unlikely(!asi_is_target_unrestricted()) && + asi_intr_nest_depth() =3D=3D 0) { + native_flush_tlb_global(); + return; + } + asi_exit(); + } =20 - /* If current->mm =3D=3D NULL then the read_cr3() "borrows" an mm */ - native_write_cr3(__native_read_cr3()); + invalidate_user_asid(asid); + invalidate_asi_pcid(NULL, asid); + + asi =3D asi_get_current(); + + if (!asi) { + /* If current->mm =3D=3D NULL then the read_cr3() "borrows" an mm */ + native_write_cr3(__native_read_cr3()); + } else { + invpcid_flush_single_context(kern_pcid(asid)); + flush_asi_pcid(asi); + } } =20 void flush_tlb_local(void) --=20 2.35.1.473.g83b2b277ed-goog From nobody Fri Jun 26 13:44:01 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 185B4C433F5 for ; Wed, 23 Feb 2022 05:26:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238371AbiBWF1V (ORCPT ); Wed, 23 Feb 2022 00:27:21 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57702 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238353AbiBWF0V (ORCPT ); Wed, 23 Feb 2022 00:26:21 -0500 Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com [IPv6:2607:f8b0:4864:20::114a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8C9D76CA51 for ; Tue, 22 Feb 2022 21:25:01 -0800 (PST) Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-2d6994a6942so160472707b3.17 for ; Tue, 22 Feb 2022 21:25:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=ht+B2vhllT9TUcS9YCbehkBKQfQF+JUOyLFYYS9ZTII=; b=X4LjelXhHrsgUAmOKDupgWtkMo3qDIfheiClzbfmM97+mtJS0Ry/l7ad7kEFvNmAMO ru8xaZ7F2Wij3ltuM/YwpEuAwo9u28wNABujo/9gA7WhQcealUnUhGS4exGNNI00mM/h QXyufTreNJLTmpGLik/8e62TY2RamcG4YVt1a9kq7jRUpJ5gpqeZZz9Udb++OBULoX7j 8m+I6tqS0xRBkRNDUPy1xvHpGNE8YXXtkcOVcVJNJbGCsF4uJmSxV3FYbdH5JUcqjBWM ug6RC3+13tjcavk58yKUhJSjRMK2SZqloPpTOfJuLf4aHFrFV1j6/rXU3Dwv5NYwIvsx Wd6g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=ht+B2vhllT9TUcS9YCbehkBKQfQF+JUOyLFYYS9ZTII=; b=o7BRIep38gAX10paYtEb/fLl2/NVIkdge8VFKdIs3xjrjzsoXgNIo5sIDKmPDut+EQ x6oOeJHIzU+MT2rgrozBCGYCWpwF+VWVbBXR6T1JXSy8lSili5awbtjZFLLyYoZz1/7J hz3DbeJflP0792bAqOIJsdULV00/9rOZ6bkJM/W5hcfk3+sE/nCRbTVTZ863bZEumPsf huc/4MH/MIaF20nDVhjRcW7ZjS09W5nL99V6l7TH8LshwWTfF6/mcd8EXhN3l/N8//th vFn/CTvWYcDmr+C9mjH0bXlSEStHPljdTtAHg2Ssfd+TrxHhcdsEsXINDmO+zWwA3kfM /aTw== X-Gm-Message-State: AOAM530lT1pFXoPaqr8QIPxMEH9pWVh1bqgPpxSvCrD2lOGRdxVsclEl GY1017Yzvqk3wia9FwH0zVDbcw7Kp5/cdov5d6TKKuIGUB71XM0JJnwebov1U7UqFpuCI32d+tZ kPFElp0CAkXbI5y80WGC7eAfRgfZILeRdAq1hm77LHBOfg8O36eZG4Sdm1UkkkiGzUkBEc4Bm X-Google-Smtp-Source: ABdhPJzjFJlAylJJF1esEWWn/fBYg6FPQgSZQ8x6J/wS+NLWdoHP+2b1OSKZqRs0ZEPrSWXIKV7/9dmH6JQ/ X-Received: from js-desktop.svl.corp.google.com ([2620:15c:2cd:202:ccbe:5d15:e2e6:322]) (user=junaids job=sendgmr) by 2002:a25:6b4d:0:b0:624:7295:42ee with SMTP id o13-20020a256b4d000000b00624729542eemr15381999ybm.290.1645593888119; Tue, 22 Feb 2022 21:24:48 -0800 (PST) Date: Tue, 22 Feb 2022 21:22:04 -0800 In-Reply-To: <20220223052223.1202152-1-junaids@google.com> Message-Id: <20220223052223.1202152-29-junaids@google.com> Mime-Version: 1.0 References: <20220223052223.1202152-1-junaids@google.com> X-Mailer: git-send-email 2.35.1.473.g83b2b277ed-goog Subject: [RFC PATCH 28/47] mm: asi: Avoid TLB flush IPIs to CPUs not in ASI context From: Junaid Shahid To: linux-kernel@vger.kernel.org Cc: kvm@vger.kernel.org, pbonzini@redhat.com, jmattson@google.com, pjt@google.com, oweisse@google.com, alexandre.chartre@oracle.com, rppt@linux.ibm.com, dave.hansen@linux.intel.com, peterz@infradead.org, tglx@linutronix.de, luto@kernel.org, linux-mm@kvack.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Most CPUs will not be running in a restricted ASI address space at any given time. So when we need to do an ASI TLB flush, we can skip those CPUs and let them do a flush at the time of the next ASI Enter. Furthermore, for flushes related to local non-sensitive memory, we can restrict the CPU set even further to those CPUs that have that specific mm_struct loaded. Signed-off-by: Junaid Shahid --- arch/x86/include/asm/asi.h | 9 +- arch/x86/include/asm/tlbflush.h | 47 +++---- arch/x86/mm/asi.c | 73 +++++++++-- arch/x86/mm/tlb.c | 209 ++++++++++++++++++++++++++++++-- 4 files changed, 282 insertions(+), 56 deletions(-) diff --git a/arch/x86/include/asm/asi.h b/arch/x86/include/asm/asi.h index 1a77917c79c7..35421356584b 100644 --- a/arch/x86/include/asm/asi.h +++ b/arch/x86/include/asm/asi.h @@ -41,6 +41,8 @@ struct asi { struct asi_class *class; struct mm_struct *mm; u16 pcid_index; + atomic64_t *tlb_gen; + atomic64_t __tlb_gen; int64_t asi_ref_count; }; =20 @@ -138,11 +140,16 @@ static inline int asi_intr_nest_depth(void) return current->thread.intr_nest_depth; } =20 +void asi_get_latest_tlb_gens(struct asi *asi, u64 *latest_local_tlb_gen, + u64 *latest_global_tlb_gen); + #define INIT_MM_ASI(init_mm) \ .asi =3D { \ [0] =3D { \ .pgd =3D asi_global_nonsensitive_pgd, \ - .mm =3D &init_mm \ + .mm =3D &init_mm, \ + .__tlb_gen =3D ATOMIC64_INIT(1), \ + .tlb_gen =3D &init_mm.asi[0].__tlb_gen \ } \ }, =20 diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflus= h.h index 295bebdb4395..85315d1d2d70 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -63,7 +63,8 @@ static inline void cr4_clear_bits(unsigned long mask) #ifdef CONFIG_ADDRESS_SPACE_ISOLATION =20 struct asi_tlb_context { - bool flush_pending; + u64 local_tlb_gen; + u64 global_tlb_gen; }; =20 #endif @@ -223,6 +224,20 @@ struct flush_tlb_info { unsigned int initiating_cpu; u8 stride_shift; u8 freed_tables; + +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + /* + * We can't use the mm pointer above, as there can be some cases where + * the mm is already freed. Of course, a flush wouldn't be necessary + * in that case, and we would know that when we compare the context ID. + * + * If U64_MAX, then a global flush would be done. + */ + u64 mm_context_id; + + /* If non-zero, flush only the ASI instance with this PCID index. */ + u16 asi_pcid_index; +#endif }; =20 void flush_tlb_local(void); @@ -281,36 +296,6 @@ unsigned long build_cr3_pcid(pgd_t *pgd, u16 pcid, boo= l noflush); u16 kern_pcid(u16 asid); u16 asi_pcid(struct asi *asi, u16 asid); =20 -#ifdef CONFIG_ADDRESS_SPACE_ISOLATION - -static inline bool *__asi_tlb_flush_pending(struct asi *asi) -{ - struct tlb_state *tlb_state; - struct tlb_context *tlb_context; - - tlb_state =3D this_cpu_ptr(&cpu_tlbstate); - tlb_context =3D &tlb_state->ctxs[tlb_state->loaded_mm_asid]; - return &tlb_context->asi_context[asi->pcid_index].flush_pending; -} - -static inline bool asi_get_and_clear_tlb_flush_pending(struct asi *asi) -{ - bool *tlb_flush_pending_ptr =3D __asi_tlb_flush_pending(asi); - bool tlb_flush_pending =3D READ_ONCE(*tlb_flush_pending_ptr); - - if (tlb_flush_pending) - WRITE_ONCE(*tlb_flush_pending_ptr, false); - - return tlb_flush_pending; -} - -static inline void asi_clear_pending_tlb_flush(struct asi *asi) -{ - WRITE_ONCE(*__asi_tlb_flush_pending(asi), false); -} - -#endif /* CONFIG_ADDRESS_SPACE_ISOLATION */ - #endif /* !MODULE */ =20 #endif /* _ASM_X86_TLBFLUSH_H */ diff --git a/arch/x86/mm/asi.c b/arch/x86/mm/asi.c index 17b8e6e60312..29c74b6d4262 100644 --- a/arch/x86/mm/asi.c +++ b/arch/x86/mm/asi.c @@ -355,6 +355,11 @@ int asi_init(struct mm_struct *mm, int asi_index, stru= ct asi **out_asi) for (i =3D pgd_index(VMALLOC_GLOBAL_NONSENSITIVE_START); i < PTRS_PER_PGD; i++) set_pgd(asi->pgd + i, asi_global_nonsensitive_pgd[i]); + + asi->tlb_gen =3D &mm->asi[0].__tlb_gen; + } else { + asi->tlb_gen =3D &asi->__tlb_gen; + atomic64_set(asi->tlb_gen, 1); } =20 exit_unlock: @@ -384,11 +389,26 @@ void asi_destroy(struct asi *asi) } EXPORT_SYMBOL_GPL(asi_destroy); =20 +void asi_get_latest_tlb_gens(struct asi *asi, u64 *latest_local_tlb_gen, + u64 *latest_global_tlb_gen) +{ + if (likely(asi->class->flags & ASI_MAP_STANDARD_NONSENSITIVE)) + *latest_global_tlb_gen =3D + atomic64_read(ASI_GLOBAL_NONSENSITIVE->tlb_gen); + else + *latest_global_tlb_gen =3D 0; + + *latest_local_tlb_gen =3D atomic64_read(asi->tlb_gen); +} + void __asi_enter(void) { u64 asi_cr3; u16 pcid; bool need_flush =3D false; + u64 latest_local_tlb_gen, latest_global_tlb_gen; + struct tlb_state *tlb_state; + struct asi_tlb_context *tlb_context; struct asi *target =3D this_cpu_read(asi_cpu_state.target_asi); =20 VM_BUG_ON(preemptible()); @@ -397,17 +417,35 @@ void __asi_enter(void) if (!target || target =3D=3D this_cpu_read(asi_cpu_state.curr_asi)) return; =20 - VM_BUG_ON(this_cpu_read(cpu_tlbstate.loaded_mm) =3D=3D - LOADED_MM_SWITCHING); + tlb_state =3D this_cpu_ptr(&cpu_tlbstate); + VM_BUG_ON(tlb_state->loaded_mm =3D=3D LOADED_MM_SWITCHING); =20 this_cpu_write(asi_cpu_state.curr_asi, target); =20 - if (static_cpu_has(X86_FEATURE_PCID)) - need_flush =3D asi_get_and_clear_tlb_flush_pending(target); + if (static_cpu_has(X86_FEATURE_PCID)) { + /* + * curr_asi write has to happen before the asi->tlb_gen reads + * below. + * + * See comments in asi_flush_tlb_range(). + */ + smp_mb(); + + asi_get_latest_tlb_gens(target, &latest_local_tlb_gen, + &latest_global_tlb_gen); + + tlb_context =3D &tlb_state->ctxs[tlb_state->loaded_mm_asid] + .asi_context[target->pcid_index]; + + if (READ_ONCE(tlb_context->local_tlb_gen) < latest_local_tlb_gen + || READ_ONCE(tlb_context->global_tlb_gen) < + latest_global_tlb_gen) + need_flush =3D true; + } =20 /* * It is possible that we may get a TLB flush IPI after - * already reading need_flush, in which case we won't do the + * already calculating need_flush, in which case we won't do the * flush below. However, in that case the interrupt epilog * will also call __asi_enter(), which will do the flush. */ @@ -416,6 +454,23 @@ void __asi_enter(void) asi_cr3 =3D build_cr3_pcid(target->pgd, pcid, !need_flush); write_cr3(asi_cr3); =20 + if (static_cpu_has(X86_FEATURE_PCID)) { + /* + * There is a small possibility that an interrupt happened + * after the read of the latest_*_tlb_gen above and when + * that interrupt did an asi_enter() upon return, it read + * an even higher latest_*_tlb_gen and already updated the + * tlb_context->*tlb_gen accordingly. In that case, the + * following will move back the tlb_context->*tlb_gen. That + * isn't ideal, but it should not cause any correctness issues. + * We may just end up doing an unnecessary TLB flush on the next + * asi_enter(). If we really needed to avoid that, we could + * just do a cmpxchg, but it is likely not necessary. + */ + WRITE_ONCE(tlb_context->local_tlb_gen, latest_local_tlb_gen); + WRITE_ONCE(tlb_context->global_tlb_gen, latest_global_tlb_gen); + } + if (target->class->ops.post_asi_enter) target->class->ops.post_asi_enter(); } @@ -504,6 +559,8 @@ int asi_init_mm_state(struct mm_struct *mm) if (!mm->asi_enabled) return 0; =20 + mm->asi[0].tlb_gen =3D &mm->asi[0].__tlb_gen; + atomic64_set(mm->asi[0].tlb_gen, 1); mm->asi[0].mm =3D mm; mm->asi[0].pgd =3D (pgd_t *)__get_free_page(GFP_PGTABLE_USER); if (!mm->asi[0].pgd) @@ -718,12 +775,6 @@ void asi_unmap(struct asi *asi, void *addr, size_t len= , bool flush_tlb) asi_flush_tlb_range(asi, addr, len); } =20 -void asi_flush_tlb_range(struct asi *asi, void *addr, size_t len) -{ - /* Later patches will do a more optimized flush. */ - flush_tlb_kernel_range((ulong)addr, (ulong)addr + len); -} - void *asi_va(unsigned long pa) { struct page *page =3D pfn_to_page(PHYS_PFN(pa)); diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 5c9681df3a16..2a442335501f 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -31,6 +31,8 @@ # define __flush_tlb_multi(msk, info) native_flush_tlb_multi(msk, info) #endif =20 +STATIC_NOPV void native_flush_tlb_global(void); + /* * TLB flushing, formerly SMP-only * c/o Linus Torvalds. @@ -173,7 +175,6 @@ static void invalidate_kern_pcid(void) =20 static void invalidate_asi_pcid(struct asi *asi, u16 asid) { - uint i; struct asi_tlb_context *asi_tlb_context; =20 if (!static_cpu_has(X86_FEATURE_ASI) || @@ -183,21 +184,30 @@ static void invalidate_asi_pcid(struct asi *asi, u16 = asid) asi_tlb_context =3D this_cpu_ptr(cpu_tlbstate.ctxs[asid].asi_context); =20 if (asi) - asi_tlb_context[asi->pcid_index].flush_pending =3D true; + asi_tlb_context[asi->pcid_index] =3D + (struct asi_tlb_context) { 0 }; else - for (i =3D 1; i < ASI_MAX_NUM; i++) - asi_tlb_context[i].flush_pending =3D true; + memset(asi_tlb_context, 0, + sizeof(struct asi_tlb_context) * ASI_MAX_NUM); } =20 static void flush_asi_pcid(struct asi *asi) { u16 asid =3D this_cpu_read(cpu_tlbstate.loaded_mm_asid); - /* - * The flag should be cleared before the INVPCID, to avoid clearing it - * in case an interrupt/exception sets it again after the INVPCID. - */ - asi_clear_pending_tlb_flush(asi); + struct asi_tlb_context *tlb_context =3D this_cpu_ptr( + &cpu_tlbstate.ctxs[asid].asi_context[asi->pcid_index]); + u64 latest_local_tlb_gen =3D atomic64_read(asi->tlb_gen); + u64 latest_global_tlb_gen =3D atomic64_read( + ASI_GLOBAL_NONSENSITIVE->tlb_gen); + invpcid_flush_single_context(asi_pcid(asi, asid)); + + /* + * This could sometimes move the *_tlb_gen backwards. See comments + * in __asi_enter(). + */ + WRITE_ONCE(tlb_context->local_tlb_gen, latest_local_tlb_gen); + WRITE_ONCE(tlb_context->global_tlb_gen, latest_global_tlb_gen); } =20 static void __flush_tlb_one_asi(struct asi *asi, u16 asid, size_t addr) @@ -1050,7 +1060,7 @@ static DEFINE_PER_CPU(unsigned int, flush_tlb_info_id= x); static struct flush_tlb_info *get_flush_tlb_info(struct mm_struct *mm, unsigned long start, unsigned long end, unsigned int stride_shift, bool freed_tables, - u64 new_tlb_gen) + u64 new_tlb_gen, u64 mm_ctx_id, u16 asi_pcid_index) { struct flush_tlb_info *info =3D this_cpu_ptr(&flush_tlb_info); =20 @@ -1071,6 +1081,11 @@ static struct flush_tlb_info *get_flush_tlb_info(str= uct mm_struct *mm, info->new_tlb_gen =3D new_tlb_gen; info->initiating_cpu =3D smp_processor_id(); =20 +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + info->mm_context_id =3D mm_ctx_id; + info->asi_pcid_index =3D asi_pcid_index; +#endif + return info; } =20 @@ -1104,7 +1119,7 @@ void flush_tlb_mm_range(struct mm_struct *mm, unsigne= d long start, new_tlb_gen =3D inc_mm_tlb_gen(mm); =20 info =3D get_flush_tlb_info(mm, start, end, stride_shift, freed_tables, - new_tlb_gen); + new_tlb_gen, 0, 0); =20 /* * flush_tlb_multi() is not optimized for the common case in which only @@ -1157,7 +1172,7 @@ void flush_tlb_kernel_range(unsigned long start, unsi= gned long end) struct flush_tlb_info *info; =20 preempt_disable(); - info =3D get_flush_tlb_info(NULL, start, end, 0, false, 0); + info =3D get_flush_tlb_info(NULL, start, end, 0, false, 0, 0, 0); =20 on_each_cpu(do_kernel_range_flush, info, 1); =20 @@ -1166,6 +1181,174 @@ void flush_tlb_kernel_range(unsigned long start, un= signed long end) } } =20 +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + +static inline void invlpg_range(size_t start, size_t end, size_t stride) +{ + size_t addr; + + for (addr =3D start; addr < end; addr +=3D stride) + invlpg(addr); +} + +static bool asi_needs_tlb_flush(struct asi *asi, struct flush_tlb_info *in= fo) +{ + if (!asi || + (info->mm_context_id !=3D U64_MAX && + info->mm_context_id !=3D asi->mm->context.ctx_id) || + (info->asi_pcid_index && info->asi_pcid_index !=3D asi->pcid_index)) + return false; + + if (unlikely(!(asi->class->flags & ASI_MAP_STANDARD_NONSENSITIVE)) && + (info->mm_context_id =3D=3D U64_MAX || !info->asi_pcid_index)) + return false; + + return true; +} + +static void __flush_asi_tlb_all(struct asi *asi) +{ + if (static_cpu_has(X86_FEATURE_INVPCID_SINGLE)) { + flush_asi_pcid(asi); + return; + } + + /* See comments in native_flush_tlb_local() */ + if (unlikely(!asi_is_target_unrestricted()) && + asi_intr_nest_depth() =3D=3D 0) { + native_flush_tlb_global(); + return; + } + + /* Let the next ASI Enter do the flush */ + asi_exit(); +} + +static void do_asi_tlb_flush(void *data) +{ + struct flush_tlb_info *info =3D data; + struct tlb_state *tlb_state =3D this_cpu_ptr(&cpu_tlbstate); + struct asi_tlb_context *tlb_context; + struct asi *asi =3D asi_get_current(); + u64 latest_local_tlb_gen, latest_global_tlb_gen; + u64 curr_local_tlb_gen, curr_global_tlb_gen; + u64 new_local_tlb_gen, new_global_tlb_gen; + bool do_flush_all; + + count_vm_tlb_event(NR_TLB_REMOTE_FLUSH_RECEIVED); + + if (!asi_needs_tlb_flush(asi, info)) + return; + + do_flush_all =3D info->end - info->start > + (tlb_single_page_flush_ceiling << PAGE_SHIFT); + + if (!static_cpu_has(X86_FEATURE_PCID)) { + if (do_flush_all) + __flush_asi_tlb_all(asi); + else + invlpg_range(info->start, info->end, PAGE_SIZE); + return; + } + + tlb_context =3D &tlb_state->ctxs[tlb_state->loaded_mm_asid] + .asi_context[asi->pcid_index]; + + asi_get_latest_tlb_gens(asi, &latest_local_tlb_gen, + &latest_global_tlb_gen); + + curr_local_tlb_gen =3D READ_ONCE(tlb_context->local_tlb_gen); + curr_global_tlb_gen =3D READ_ONCE(tlb_context->global_tlb_gen); + + if (info->mm_context_id =3D=3D U64_MAX) { + new_global_tlb_gen =3D info->new_tlb_gen; + new_local_tlb_gen =3D curr_local_tlb_gen; + } else { + new_local_tlb_gen =3D info->new_tlb_gen; + new_global_tlb_gen =3D curr_global_tlb_gen; + } + + /* Somebody already did a full flush */ + if (new_local_tlb_gen <=3D curr_local_tlb_gen && + new_global_tlb_gen <=3D curr_global_tlb_gen) + return; + + /* + * If we can't bring the TLB up-to-date with a range flush, then do a + * full flush anyway. + */ + if (do_flush_all || !(new_local_tlb_gen =3D=3D latest_local_tlb_gen && + new_global_tlb_gen =3D=3D latest_global_tlb_gen && + new_local_tlb_gen <=3D curr_local_tlb_gen + 1 && + new_global_tlb_gen <=3D curr_global_tlb_gen + 1)) { + __flush_asi_tlb_all(asi); + return; + } + + invlpg_range(info->start, info->end, PAGE_SIZE); + + /* + * If we are still in ASI context, then all the INVLPGs flushed the + * ASI PCID and so we can update the tlb_gens. + */ + if (asi_get_current() =3D=3D asi) { + WRITE_ONCE(tlb_context->local_tlb_gen, new_local_tlb_gen); + WRITE_ONCE(tlb_context->global_tlb_gen, new_global_tlb_gen); + } +} + +static bool is_asi_active_on_cpu(int cpu, void *info) +{ + return per_cpu(asi_cpu_state.curr_asi, cpu); +} + +void asi_flush_tlb_range(struct asi *asi, void *addr, size_t len) +{ + size_t start =3D (size_t)addr; + size_t end =3D start + len; + struct flush_tlb_info *info; + u64 mm_context_id; + const cpumask_t *cpu_mask; + u64 new_tlb_gen =3D 0; + + if (!static_cpu_has(X86_FEATURE_ASI)) + return; + + if (static_cpu_has(X86_FEATURE_PCID)) { + new_tlb_gen =3D atomic64_inc_return(asi->tlb_gen); + + /* + * The increment of tlb_gen must happen before the curr_asi + * reads in is_asi_active_on_cpu(). That ensures that if another + * CPU is in asi_enter() and happens to write to curr_asi after + * is_asi_active_on_cpu() read it, it will see the updated + * tlb_gen and perform a flush during the TLB switch. + */ + smp_mb__after_atomic(); + } + + preempt_disable(); + + if (asi =3D=3D ASI_GLOBAL_NONSENSITIVE) { + mm_context_id =3D U64_MAX; + cpu_mask =3D cpu_online_mask; + } else { + mm_context_id =3D asi->mm->context.ctx_id; + cpu_mask =3D mm_cpumask(asi->mm); + } + + info =3D get_flush_tlb_info(NULL, start, end, 0, false, new_tlb_gen, + mm_context_id, asi->pcid_index); + + on_each_cpu_cond_mask(is_asi_active_on_cpu, do_asi_tlb_flush, info, + true, cpu_mask); + + put_flush_tlb_info(); + preempt_enable(); +} + +#endif + /* * This can be used from process context to figure out what the value of * CR3 is without needing to do a (slow) __read_cr3(). @@ -1415,7 +1598,7 @@ void arch_tlbbatch_flush(struct arch_tlbflush_unmap_b= atch *batch) =20 int cpu =3D get_cpu(); =20 - info =3D get_flush_tlb_info(NULL, 0, TLB_FLUSH_ALL, 0, false, 0); + info =3D get_flush_tlb_info(NULL, 0, TLB_FLUSH_ALL, 0, false, 0, 0, 0); /* * flush_tlb_multi() is not optimized for the common case in which only * a local TLB flush is needed. Optimize this use-case by calling --=20 2.35.1.473.g83b2b277ed-goog From nobody Fri Jun 26 13:44:01 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 81196C433F5 for ; Wed, 23 Feb 2022 05:26:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238323AbiBWF1F (ORCPT ); Wed, 23 Feb 2022 00:27:05 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57958 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238316AbiBWF0V (ORCPT ); Wed, 23 Feb 2022 00:26:21 -0500 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7B25F6D185 for ; Tue, 22 Feb 2022 21:25:03 -0800 (PST) Received: by mail-yb1-xb4a.google.com with SMTP id k10-20020a056902070a00b0062469b00335so10829723ybt.14 for ; Tue, 22 Feb 2022 21:25:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=xuSOdR8cu+3sckZI73No+K9Gsgd7KYxoVcTvOLd9SRE=; b=L1vzi2HE8mlZkGyGTJ2UNkXNrCVKkhnCd6edxiG2679p05lL9RiITgedK2mHCr1gL1 r2NbbiBimY0XMB+ilTAP/tinO8mfPjfgIiKo0Vo4CIbvxKkOOdhmUUJYzmRPnVEAIELz CaVtVNSnDCTPdbHIF9MBinzjbvZUfNhf/kWzWIx9iGyWw1+hTpjVB6Q9fAmZ4GqOYOQ9 UfdgAMaldxCH3MN9zplE0ZKYVWhDkyZhstobQhDLPnFPNehmwZEzxYqfvR65uiELnTbL w+77nDxtgfJrLubYlcykpyqTJJAw4YhWcHGLjL5ZvkLqkeKvUc2iiWBXMc2g3lSez3xL mBMg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=xuSOdR8cu+3sckZI73No+K9Gsgd7KYxoVcTvOLd9SRE=; b=sT60JryR5Q0DSa8IeDKBslmgV9o2yv0PPnvIMuAFBU7Bp8MPDvF7LmwIEVRj7s3CTC MDFEFHU6o/tneUSqsKPOFg2TtiXCxkuuCutvgy4CCZz+zyPidSc/7FfEFCr7pZ1E1wyp U6igPyIYoE+nlAhNPD0gHs5pnJMKFNx01565CahluJpNnHOHCkeIfJ7aG4oLDtojeCwL xAO7Aw3oIOJ+3AmyhXq22VSsQRRN8YOvpauU+t0Fl/fs6Iaj2xXTSH3qJ9+dzRoDjcwY DlUFgTRQ+RuGbT2pT+tMLd3D/SUyQQ6SrZrhvAbfXLnxtcFDjoOB6Vs2St4OBJeyZmLP HZTg== X-Gm-Message-State: AOAM533FAf+FRwkIl1x7TfINicR2oSqqEkHcdNcJWzlFN6ZSARB3zgzp NKoZLqCy8CaPdXk4gnim+NzLaoSzY2B/xISsOBeqHGcMHLbVAxoG7pS9F4VKkn77HG3DYwKhstW IL0vzOZATP06LwAhjLWcNAeWqdwiARHy5YPeAOta/daedmuAc+9qGU1NG9aYt0ylsQH5QUPNP X-Google-Smtp-Source: ABdhPJxz+92U+hDuWLEHSSBb08IRQ0vjXntk0j4pQJ0mXKWbSMLKP0n+iJJ9k2gkkoSoqaVm7kKAazHDcEee X-Received: from js-desktop.svl.corp.google.com ([2620:15c:2cd:202:ccbe:5d15:e2e6:322]) (user=junaids job=sendgmr) by 2002:a25:aa2c:0:b0:624:64ce:8550 with SMTP id s41-20020a25aa2c000000b0062464ce8550mr16649367ybi.105.1645593890279; Tue, 22 Feb 2022 21:24:50 -0800 (PST) Date: Tue, 22 Feb 2022 21:22:05 -0800 In-Reply-To: <20220223052223.1202152-1-junaids@google.com> Message-Id: <20220223052223.1202152-30-junaids@google.com> Mime-Version: 1.0 References: <20220223052223.1202152-1-junaids@google.com> X-Mailer: git-send-email 2.35.1.473.g83b2b277ed-goog Subject: [RFC PATCH 29/47] mm: asi: Reduce TLB flushes when freeing pages asynchronously From: Junaid Shahid To: linux-kernel@vger.kernel.org Cc: kvm@vger.kernel.org, pbonzini@redhat.com, jmattson@google.com, pjt@google.com, oweisse@google.com, alexandre.chartre@oracle.com, rppt@linux.ibm.com, dave.hansen@linux.intel.com, peterz@infradead.org, tglx@linutronix.de, luto@kernel.org, linux-mm@kvack.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" When we are freeing pages asynchronously (because the original free was issued with IRQs disabled), issue only one TLB flush per execution of the async work function. If there is only one page to free, we do a targeted flush for that page only. Otherwise, we just do a full flush. Signed-off-by: Junaid Shahid --- arch/x86/include/asm/tlbflush.h | 8 +++++ arch/x86/mm/tlb.c | 52 ++++++++++++++++++++------------- include/linux/mm_types.h | 30 +++++++++++++------ mm/page_alloc.c | 40 ++++++++++++++++++++----- 4 files changed, 93 insertions(+), 37 deletions(-) diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflus= h.h index 85315d1d2d70..7d04aa2a5f86 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -296,6 +296,14 @@ unsigned long build_cr3_pcid(pgd_t *pgd, u16 pcid, boo= l noflush); u16 kern_pcid(u16 asid); u16 asi_pcid(struct asi *asi, u16 asid); =20 +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + +void __asi_prepare_tlb_flush(struct asi *asi, u64 *new_tlb_gen); +void __asi_flush_tlb_range(u64 mm_context_id, u16 pcid_index, u64 new_tlb_= gen, + size_t start, size_t end, const cpumask_t *cpu_mask); + +#endif /* CONFIG_ADDRESS_SPACE_ISOLATION */ + #endif /* !MODULE */ =20 #endif /* _ASM_X86_TLBFLUSH_H */ diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 2a442335501f..fcd2c8e92f83 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -1302,21 +1302,10 @@ static bool is_asi_active_on_cpu(int cpu, void *inf= o) return per_cpu(asi_cpu_state.curr_asi, cpu); } =20 -void asi_flush_tlb_range(struct asi *asi, void *addr, size_t len) +void __asi_prepare_tlb_flush(struct asi *asi, u64 *new_tlb_gen) { - size_t start =3D (size_t)addr; - size_t end =3D start + len; - struct flush_tlb_info *info; - u64 mm_context_id; - const cpumask_t *cpu_mask; - u64 new_tlb_gen =3D 0; - - if (!static_cpu_has(X86_FEATURE_ASI)) - return; - if (static_cpu_has(X86_FEATURE_PCID)) { - new_tlb_gen =3D atomic64_inc_return(asi->tlb_gen); - + *new_tlb_gen =3D atomic64_inc_return(asi->tlb_gen); /* * The increment of tlb_gen must happen before the curr_asi * reads in is_asi_active_on_cpu(). That ensures that if another @@ -1326,8 +1315,35 @@ void asi_flush_tlb_range(struct asi *asi, void *addr= , size_t len) */ smp_mb__after_atomic(); } +} + +void __asi_flush_tlb_range(u64 mm_context_id, u16 pcid_index, u64 new_tlb_= gen, + size_t start, size_t end, const cpumask_t *cpu_mask) +{ + struct flush_tlb_info *info; =20 preempt_disable(); + info =3D get_flush_tlb_info(NULL, start, end, 0, false, new_tlb_gen, + mm_context_id, pcid_index); + + on_each_cpu_cond_mask(is_asi_active_on_cpu, do_asi_tlb_flush, info, + true, cpu_mask); + put_flush_tlb_info(); + preempt_enable(); +} + +void asi_flush_tlb_range(struct asi *asi, void *addr, size_t len) +{ + size_t start =3D (size_t)addr; + size_t end =3D start + len; + u64 mm_context_id; + u64 new_tlb_gen =3D 0; + const cpumask_t *cpu_mask; + + if (!static_cpu_has(X86_FEATURE_ASI)) + return; + + __asi_prepare_tlb_flush(asi, &new_tlb_gen); =20 if (asi =3D=3D ASI_GLOBAL_NONSENSITIVE) { mm_context_id =3D U64_MAX; @@ -1337,14 +1353,8 @@ void asi_flush_tlb_range(struct asi *asi, void *addr= , size_t len) cpu_mask =3D mm_cpumask(asi->mm); } =20 - info =3D get_flush_tlb_info(NULL, start, end, 0, false, new_tlb_gen, - mm_context_id, asi->pcid_index); - - on_each_cpu_cond_mask(is_asi_active_on_cpu, do_asi_tlb_flush, info, - true, cpu_mask); - - put_flush_tlb_info(); - preempt_enable(); + __asi_flush_tlb_range(mm_context_id, asi->pcid_index, new_tlb_gen, + start, end, cpu_mask); } =20 #endif diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 56511adc263e..7d38229ca85c 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -193,21 +193,33 @@ struct page { /** @rcu_head: You can use this to free a page by RCU. */ struct rcu_head rcu_head; =20 -#ifdef CONFIG_ADDRESS_SPACE_ISOLATION +#if defined(CONFIG_ADDRESS_SPACE_ISOLATION) && !defined(BUILD_VDSO32) struct { /* Links the pages_to_free_async list */ struct llist_node async_free_node; =20 unsigned long _asi_pad_1; - unsigned long _asi_pad_2; + u64 asi_tlb_gen; =20 - /* - * Upon allocation of a locally non-sensitive page, set - * to the allocating mm. Must be set to the same mm when - * the page is freed. May potentially be overwritten in - * the meantime, as long as it is restored before free. - */ - struct mm_struct *asi_mm; + union { + /* + * Upon allocation of a locally non-sensitive + * page, set to the allocating mm. Must be set + * to the same mm when the page is freed. May + * potentially be overwritten in the meantime, + * as long as it is restored before free. + */ + struct mm_struct *asi_mm; + + /* + * Set to the above mm's context ID if the page + * is being freed asynchronously. Can't directly + * use the mm_struct, unless we take additional + * steps to avoid it from being freed while the + * async work is pending. + */ + u64 asi_mm_ctx_id; + }; }; #endif }; diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 01784bff2a80..998ff6a56732 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5182,20 +5182,41 @@ static void async_free_work_fn(struct work_struct *= work) { struct page *page, *tmp; struct llist_node *pages_to_free; - void *va; - size_t len; + size_t addr; uint order; =20 pages_to_free =3D llist_del_all(this_cpu_ptr(&pages_to_free_async)); =20 - /* A later patch will do a more optimized TLB flush. */ + if (!pages_to_free) + return; + + /* If we only have one page to free, then do a targeted TLB flush. */ + if (!llist_next(pages_to_free)) { + page =3D llist_entry(pages_to_free, struct page, async_free_node); + addr =3D (size_t)page_to_virt(page); + order =3D page->private; + + __asi_flush_tlb_range(page->asi_mm_ctx_id, 0, page->asi_tlb_gen, + addr, addr + PAGE_SIZE * (1 << order), + cpu_online_mask); + /* Need to clear, since it shares space with page->mapping. */ + page->asi_tlb_gen =3D 0; + + __free_the_page(page, order); + return; + } + + /* + * Otherwise, do a full flush. We could potentially try to optimize it + * via taking a union of what needs to be flushed, but it may not be + * worth the additional complexity. + */ + asi_flush_tlb_range(ASI_GLOBAL_NONSENSITIVE, 0, TLB_FLUSH_ALL); =20 llist_for_each_entry_safe(page, tmp, pages_to_free, async_free_node) { - va =3D page_to_virt(page); order =3D page->private; - len =3D PAGE_SIZE * (1 << order); - - asi_flush_tlb_range(ASI_GLOBAL_NONSENSITIVE, va, len); + /* Need to clear, since it shares space with page->mapping. */ + page->asi_tlb_gen =3D 0; __free_the_page(page, order); } } @@ -5291,6 +5312,11 @@ static bool asi_unmap_freed_pages(struct page *page,= unsigned int order) if (!async_flush_needed) return true; =20 + page->asi_mm_ctx_id =3D PageGlobalNonSensitive(page) + ? U64_MAX : asi->mm->context.ctx_id; + + __asi_prepare_tlb_flush(asi, &page->asi_tlb_gen); + page->private =3D order; llist_add(&page->async_free_node, this_cpu_ptr(&pages_to_free_async)); =20 --=20 2.35.1.473.g83b2b277ed-goog From nobody Fri Jun 26 13:44:01 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 890A8C433F5 for ; Wed, 23 Feb 2022 05:26:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238345AbiBWF1O (ORCPT ); Wed, 23 Feb 2022 00:27:14 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56902 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238361AbiBWF0i (ORCPT ); Wed, 23 Feb 2022 00:26:38 -0500 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CCD876D876 for ; Tue, 22 Feb 2022 21:25:04 -0800 (PST) Received: by mail-yb1-xb49.google.com with SMTP id s133-20020a252c8b000000b0062112290d0bso26613831ybs.23 for ; Tue, 22 Feb 2022 21:25:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=vK4stcDcFWOmnS7XFdl48O2PCgqnR6b+h+9UyqGSKHo=; b=WeE4/kJf1JBwoHaoRE1hNnNZMWwTToI7DEJHTID7WIxYsRDDOQgqC+1Tt7v7oX0Gpi ZmRLZAZ2vDdsuIDB2S8uWUw1YsFiIl0loYVOvIAhqXAjeeol/TFwLp8ORGxcWp10Znnv Fm89xpJAHuDIyCGkiR6qSwg/Ed2RYqbaYiZbeHx1AcGw15lQM8m5E4AwvqrpiEZ4xaWB fo5SYVGmwmAcFx/4QD/6NyE3+JjhT+8OXDlEZf2AotxVIkqYFj/Rh40KkPJ8CI4n5jQK 7ZdRdvip/O5Och4s64bR2f4QDQ48Fke+i6WKvHWJfT9D9ayGbTZXm6UK8dYn9Je6DvRz f9cA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=vK4stcDcFWOmnS7XFdl48O2PCgqnR6b+h+9UyqGSKHo=; b=zNBt5ICpIYD9jEFiol0rWYMMUfm58V2elVb0wcV0FAe+iZ9sjCQM2KalkxdLklBDlt ORmqSGnRQl8vxUHtEyhqtS69fHz00vxK0L0TLm3OQ12io0UX769G5SqbAART3TrC+rJj Q9vqfIKPZRjzyOKPPyUlBJMHqkjj9SShegdvX2fjmSLgeQEA3B7AuKCqEu5nWla12On0 NAKvLMHLfNZykDyLlDSFxfT1scWUQo68R5UVp52WdtRRSsIKt7jWPrbbQU8E+rCYjkp0 U/Qg/NcKahrQtlvpNJXVg4I7+VJ/dpiQVLq6qa7s2dYAzTpK8VrdQgCROhzmYNawZOfZ LSPA== X-Gm-Message-State: AOAM531Iq9mnIb0y4mazO+xwYpKFpXJICYr3J3tt5W/+KFVTv47wHvwL PRvBYUsGvMoX5Wbg/MqNXAY225+LaoFZCvS1pDMdq5mT1kJy8Yaq5ZPEus7+Bk8M/Vvm2ZGh2E9 m5Zqp5Y6jVXRyYJoaxgh8YDl5DvPMEEwgBpwkt8XnferGn2q8JL7NgswCp7ZpOifF2ZV+EIga X-Google-Smtp-Source: ABdhPJyP9PANVGMHQ1NQcOMu/RGh/MDmkswDGY7BrfXZ28lpcuzIVCL0D2Bn/aYlOJdsoN0ZdyV9/jnSWe04 X-Received: from js-desktop.svl.corp.google.com ([2620:15c:2cd:202:ccbe:5d15:e2e6:322]) (user=junaids job=sendgmr) by 2002:a05:6902:543:b0:61d:c152:bd19 with SMTP id z3-20020a056902054300b0061dc152bd19mr27379968ybs.377.1645593892611; Tue, 22 Feb 2022 21:24:52 -0800 (PST) Date: Tue, 22 Feb 2022 21:22:06 -0800 In-Reply-To: <20220223052223.1202152-1-junaids@google.com> Message-Id: <20220223052223.1202152-31-junaids@google.com> Mime-Version: 1.0 References: <20220223052223.1202152-1-junaids@google.com> X-Mailer: git-send-email 2.35.1.473.g83b2b277ed-goog Subject: [RFC PATCH 30/47] mm: asi: Add API for mapping userspace address ranges From: Junaid Shahid To: linux-kernel@vger.kernel.org Cc: kvm@vger.kernel.org, pbonzini@redhat.com, jmattson@google.com, pjt@google.com, oweisse@google.com, alexandre.chartre@oracle.com, rppt@linux.ibm.com, dave.hansen@linux.intel.com, peterz@infradead.org, tglx@linutronix.de, luto@kernel.org, linux-mm@kvack.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" asi_map_user()/asi_unmap_user() can be used to map userspace address ranges for ASI classes that do not specify ASI_MAP_ALL_USERSPACE. In addition, another structure, asi_pgtbl_pool, allows for pre-allocating a set of pages to avoid having to allocate memory for page tables within asi_map_user(), which makes it easier to use that function while holding locks. Signed-off-by: Junaid Shahid --- arch/x86/include/asm/asi.h | 19 +++ arch/x86/mm/asi.c | 252 ++++++++++++++++++++++++++++++++++--- include/asm-generic/asi.h | 21 ++++ include/linux/mm_types.h | 2 +- 4 files changed, 275 insertions(+), 19 deletions(-) diff --git a/arch/x86/include/asm/asi.h b/arch/x86/include/asm/asi.h index 35421356584b..bdb2f70d4f85 100644 --- a/arch/x86/include/asm/asi.h +++ b/arch/x86/include/asm/asi.h @@ -44,6 +44,12 @@ struct asi { atomic64_t *tlb_gen; atomic64_t __tlb_gen; int64_t asi_ref_count; + rwlock_t user_map_lock; +}; + +struct asi_pgtbl_pool { + struct page *pgtbl_list; + uint count; }; =20 DECLARE_PER_CPU_ALIGNED(struct asi_state, asi_cpu_state); @@ -74,6 +80,19 @@ void asi_do_lazy_map(struct asi *asi, size_t addr); void asi_clear_user_pgd(struct mm_struct *mm, size_t addr); void asi_clear_user_p4d(struct mm_struct *mm, size_t addr); =20 +int asi_map_user(struct asi *asi, void *addr, size_t len, + struct asi_pgtbl_pool *pool, + size_t allowed_start, size_t allowed_end); +void asi_unmap_user(struct asi *asi, void *va, size_t len); +int asi_fill_pgtbl_pool(struct asi_pgtbl_pool *pool, uint count, gfp_t fl= ags); +void asi_clear_pgtbl_pool(struct asi_pgtbl_pool *pool); + +static inline void asi_init_pgtbl_pool(struct asi_pgtbl_pool *pool) +{ + pool->pgtbl_list =3D NULL; + pool->count =3D 0; +} + static inline void asi_init_thread_state(struct thread_struct *thread) { thread->intr_nest_depth =3D 0; diff --git a/arch/x86/mm/asi.c b/arch/x86/mm/asi.c index 29c74b6d4262..9b1bd005f343 100644 --- a/arch/x86/mm/asi.c +++ b/arch/x86/mm/asi.c @@ -86,6 +86,55 @@ void asi_unregister_class(int index) } EXPORT_SYMBOL_GPL(asi_unregister_class); =20 +static ulong get_pgtbl_from_pool(struct asi_pgtbl_pool *pool) +{ + struct page *pgtbl; + + if (pool->count =3D=3D 0) + return 0; + + pgtbl =3D pool->pgtbl_list; + pool->pgtbl_list =3D pgtbl->asi_pgtbl_pool_next; + pgtbl->asi_pgtbl_pool_next =3D NULL; + pool->count--; + + return (ulong)page_address(pgtbl); +} + +static void return_pgtbl_to_pool(struct asi_pgtbl_pool *pool, ulong virt) +{ + struct page *pgtbl =3D virt_to_page(virt); + + pgtbl->asi_pgtbl_pool_next =3D pool->pgtbl_list; + pool->pgtbl_list =3D pgtbl; + pool->count++; +} + +int asi_fill_pgtbl_pool(struct asi_pgtbl_pool *pool, uint count, gfp_t fla= gs) +{ + if (!static_cpu_has(X86_FEATURE_ASI)) + return 0; + + while (pool->count < count) { + ulong pgtbl =3D get_zeroed_page(flags); + + if (!pgtbl) + return -ENOMEM; + + return_pgtbl_to_pool(pool, pgtbl); + } + + return 0; +} +EXPORT_SYMBOL_GPL(asi_fill_pgtbl_pool); + +void asi_clear_pgtbl_pool(struct asi_pgtbl_pool *pool) +{ + while (pool->count > 0) + free_page(get_pgtbl_from_pool(pool)); +} +EXPORT_SYMBOL_GPL(asi_clear_pgtbl_pool); + static void asi_clone_pgd(pgd_t *dst_table, pgd_t *src_table, size_t addr) { pgd_t *src =3D pgd_offset_pgd(src_table, addr); @@ -110,10 +159,12 @@ static void asi_clone_pgd(pgd_t *dst_table, pgd_t *sr= c_table, size_t addr) #define DEFINE_ASI_PGTBL_ALLOC(base, level) \ static level##_t * asi_##level##_alloc(struct asi *asi, \ base##_t *base, ulong addr, \ - gfp_t flags) \ + gfp_t flags, \ + struct asi_pgtbl_pool *pool) \ { \ if (unlikely(base##_none(*base))) { \ - ulong pgtbl =3D get_zeroed_page(flags); \ + ulong pgtbl =3D pool ? get_pgtbl_from_pool(pool) \ + : get_zeroed_page(flags); \ phys_addr_t pgtbl_pa; \ \ if (pgtbl =3D=3D 0) \ @@ -127,7 +178,10 @@ static level##_t * asi_##level##_alloc(struct asi *asi= , \ mm_inc_nr_##level##s(asi->mm); \ } else { \ paravirt_release_##level(PHYS_PFN(pgtbl_pa)); \ - free_page(pgtbl); \ + if (pool) \ + return_pgtbl_to_pool(pool, pgtbl); \ + else \ + free_page(pgtbl); \ } \ \ /* NOP on native. PV call on Xen. */ \ @@ -336,6 +390,7 @@ int asi_init(struct mm_struct *mm, int asi_index, struc= t asi **out_asi) asi->class =3D &asi_class[asi_index]; asi->mm =3D mm; asi->pcid_index =3D asi_index; + rwlock_init(&asi->user_map_lock); =20 if (asi->class->flags & ASI_MAP_STANDARD_NONSENSITIVE) { uint i; @@ -650,11 +705,6 @@ static bool follow_physaddr(struct mm_struct *mm, size= _t virt, /* * Map the given range into the ASI page tables. The source of the mapping * is the regular unrestricted page tables. - * Can be used to map any kernel memory. - * - * The caller MUST ensure that the source mapping will not change during t= his - * function. For dynamic kernel memory, this is generally ensured by mappi= ng - * the memory within the allocator. * * If the source mapping is a large page and the range being mapped spans = the * entire large page, then it will be mapped as a large page in the ASI pa= ge @@ -664,19 +714,17 @@ static bool follow_physaddr(struct mm_struct *mm, siz= e_t virt, * destination page, but that should be ok for now, as usually in such cas= es, * the range would consist of a small-ish number of pages. */ -int asi_map_gfp(struct asi *asi, void *addr, size_t len, gfp_t gfp_flags) +int __asi_map(struct asi *asi, size_t start, size_t end, gfp_t gfp_flags, + struct asi_pgtbl_pool *pool, + size_t allowed_start, size_t allowed_end) { size_t virt; - size_t start =3D (size_t)addr; - size_t end =3D start + len; size_t page_size; =20 - if (!static_cpu_has(X86_FEATURE_ASI) || !asi) - return 0; - VM_BUG_ON(start & ~PAGE_MASK); - VM_BUG_ON(len & ~PAGE_MASK); - VM_BUG_ON(start < TASK_SIZE_MAX); + VM_BUG_ON(end & ~PAGE_MASK); + VM_BUG_ON(end > allowed_end); + VM_BUG_ON(start < allowed_start); =20 gfp_flags &=3D GFP_RECLAIM_MASK; =20 @@ -702,14 +750,15 @@ int asi_map_gfp(struct asi *asi, void *addr, size_t l= en, gfp_t gfp_flags) continue; \ } \ \ - level =3D asi_##level##_alloc(asi, base, virt, gfp_flags);\ + level =3D asi_##level##_alloc(asi, base, virt, \ + gfp_flags, pool); \ if (!level) \ return -ENOMEM; \ \ if (page_size >=3D LEVEL##_SIZE && \ (level##_none(*level) || level##_leaf(*level)) && \ is_page_within_range(virt, LEVEL##_SIZE, \ - start, end)) { \ + allowed_start, allowed_end)) {\ page_size =3D LEVEL##_SIZE; \ phys &=3D LEVEL##_MASK; \ \ @@ -737,6 +786,26 @@ int asi_map_gfp(struct asi *asi, void *addr, size_t le= n, gfp_t gfp_flags) return 0; } =20 +/* + * Maps the given kernel address range into the ASI page tables. + * + * The caller MUST ensure that the source mapping will not change during t= his + * function. For dynamic kernel memory, this is generally ensured by mappi= ng + * the memory within the allocator. + */ +int asi_map_gfp(struct asi *asi, void *addr, size_t len, gfp_t gfp_flags) +{ + size_t start =3D (size_t)addr; + size_t end =3D start + len; + + if (!static_cpu_has(X86_FEATURE_ASI) || !asi) + return 0; + + VM_BUG_ON(start < TASK_SIZE_MAX); + + return __asi_map(asi, start, end, gfp_flags, NULL, start, end); +} + int asi_map(struct asi *asi, void *addr, size_t len) { return asi_map_gfp(asi, addr, len, GFP_KERNEL); @@ -935,3 +1004,150 @@ void asi_clear_user_p4d(struct mm_struct *mm, size_t= addr) if (!pgtable_l5_enabled()) __asi_clear_user_pgd(mm, addr); } + +/* + * Maps the given userspace address range into the ASI page tables. + * + * The caller MUST ensure that the source mapping will not change during t= his + * function e.g. by synchronizing via MMU notifiers or acquiring the + * appropriate locks. + */ +int asi_map_user(struct asi *asi, void *addr, size_t len, + struct asi_pgtbl_pool *pool, + size_t allowed_start, size_t allowed_end) +{ + int err; + size_t start =3D (size_t)addr; + size_t end =3D start + len; + + if (!static_cpu_has(X86_FEATURE_ASI) || !asi) + return 0; + + VM_BUG_ON(end > TASK_SIZE_MAX); + + read_lock(&asi->user_map_lock); + err =3D __asi_map(asi, start, end, GFP_NOWAIT, pool, + allowed_start, allowed_end); + read_unlock(&asi->user_map_lock); + + return err; +} +EXPORT_SYMBOL_GPL(asi_map_user); + +static bool +asi_unmap_free_pte_range(struct asi_pgtbl_pool *pgtbls_to_free, + pte_t *pte, size_t addr, size_t end) +{ + do { + pte_clear(NULL, addr, pte); + } while (pte++, addr +=3D PAGE_SIZE, addr !=3D end); + + return true; +} + +#define DEFINE_ASI_UNMAP_FREE_RANGE(level, LEVEL, next_level, NEXT_LVL_SIZ= E) \ +static bool \ +asi_unmap_free_##level##_range(struct asi_pgtbl_pool *pgtbls_to_free, = \ + level##_t *level, size_t addr, size_t end) \ +{ \ + bool unmapped =3D false; \ + size_t next; \ + \ + do { \ + next =3D level##_addr_end(addr, end); \ + if (level##_none(*level)) \ + continue; \ + \ + if (IS_ALIGNED(addr, LEVEL##_SIZE) && \ + IS_ALIGNED(next, LEVEL##_SIZE)) { \ + if (!level##_large(*level)) { \ + ulong pgtbl =3D level##_page_vaddr(*level); \ + struct page *page =3D virt_to_page(pgtbl); \ + \ + page->private =3D PG_LEVEL_##NEXT_LVL_SIZE; \ + return_pgtbl_to_pool(pgtbls_to_free, pgtbl); \ + } \ + level##_clear(level); \ + unmapped =3D true; \ + } else { \ + /* \ + * At this time, we don't have a case where we need to \ + * unmap a subset of a huge page. But that could arise \ + * in the future. In that case, we'll need to split \ + * the huge mapping here. \ + */ \ + if (WARN_ON(level##_large(*level))) \ + continue; \ + \ + unmapped |=3D asi_unmap_free_##next_level##_range( \ + pgtbls_to_free, \ + next_level##_offset(level, addr), \ + addr, next); \ + } \ + } while (level++, addr =3D next, addr !=3D end); \ + \ + return unmapped; \ +} + +DEFINE_ASI_UNMAP_FREE_RANGE(pmd, PMD, pte, 4K) +DEFINE_ASI_UNMAP_FREE_RANGE(pud, PUD, pmd, 2M) +DEFINE_ASI_UNMAP_FREE_RANGE(p4d, P4D, pud, 1G) +DEFINE_ASI_UNMAP_FREE_RANGE(pgd, PGDIR, p4d, 512G) + +static bool asi_unmap_and_free_range(struct asi_pgtbl_pool *pgtbls_to_free, + struct asi *asi, size_t addr, size_t end) +{ + size_t next; + bool unmapped =3D false; + pgd_t *pgd =3D pgd_offset_pgd(asi->pgd, addr); + + BUILD_BUG_ON((void *)&((struct page *)NULL)->private =3D=3D + (void *)&((struct page *)NULL)->asi_pgtbl_pool_next); + + if (pgtable_l5_enabled()) + return asi_unmap_free_pgd_range(pgtbls_to_free, pgd, addr, end); + + do { + next =3D pgd_addr_end(addr, end); + unmapped |=3D asi_unmap_free_p4d_range(pgtbls_to_free, + p4d_offset(pgd, addr), + addr, next); + } while (pgd++, addr =3D next, addr !=3D end); + + return unmapped; +} + +void asi_unmap_user(struct asi *asi, void *addr, size_t len) +{ + static void (*const free_pgtbl_at_level[])(struct asi *, size_t) =3D { + NULL, + asi_free_pte, + asi_free_pmd, + asi_free_pud, + asi_free_p4d + }; + + struct asi_pgtbl_pool pgtbls_to_free =3D { 0 }; + size_t start =3D (size_t)addr; + size_t end =3D start + len; + bool unmapped; + + if (!static_cpu_has(X86_FEATURE_ASI) || !asi) + return; + + write_lock(&asi->user_map_lock); + unmapped =3D asi_unmap_and_free_range(&pgtbls_to_free, asi, start, end); + write_unlock(&asi->user_map_lock); + + if (unmapped) + asi_flush_tlb_range(asi, addr, len); + + while (pgtbls_to_free.count > 0) { + size_t pgtbl =3D get_pgtbl_from_pool(&pgtbls_to_free); + struct page *page =3D virt_to_page(pgtbl); + + VM_BUG_ON(page->private >=3D PG_LEVEL_NUM); + free_pgtbl_at_level[page->private](asi, pgtbl); + } +} +EXPORT_SYMBOL_GPL(asi_unmap_user); diff --git a/include/asm-generic/asi.h b/include/asm-generic/asi.h index 8513d0d7865a..fffb323d2a00 100644 --- a/include/asm-generic/asi.h +++ b/include/asm-generic/asi.h @@ -26,6 +26,7 @@ =20 struct asi_hooks {}; struct asi {}; +struct asi_pgtbl_pool {}; =20 static inline int asi_register_class(const char *name, uint flags, @@ -92,6 +93,26 @@ void asi_clear_user_pgd(struct mm_struct *mm, size_t add= r) { } static inline void asi_clear_user_p4d(struct mm_struct *mm, size_t addr) { } =20 +static inline +int asi_map_user(struct asi *asi, void *addr, size_t len, + struct asi_pgtbl_pool *pool, + size_t allowed_start, size_t allowed_end) +{ + return 0; +} + +static inline void asi_unmap_user(struct asi *asi, void *va, size_t len) {= } + +static inline +int asi_fill_pgtbl_pool(struct asi_pgtbl_pool *pool, uint count, gfp_t fla= gs) +{ + return 0; +} + +static inline void asi_clear_pgtbl_pool(struct asi_pgtbl_pool *pool) { } + +static inline void asi_init_pgtbl_pool(struct asi_pgtbl_pool *pool) { } + static inline void asi_flush_tlb_range(struct asi *asi, void *addr, size_t len) { } =20 diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 7d38229ca85c..c3f209720a84 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -198,7 +198,7 @@ struct page { /* Links the pages_to_free_async list */ struct llist_node async_free_node; =20 - unsigned long _asi_pad_1; + struct page *asi_pgtbl_pool_next; u64 asi_tlb_gen; =20 union { --=20 2.35.1.473.g83b2b277ed-goog From nobody Fri Jun 26 13:44:01 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id DE787C433EF for ; Wed, 23 Feb 2022 05:28:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238417AbiBWF24 (ORCPT ); Wed, 23 Feb 2022 00:28:56 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58044 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237690AbiBWF0l (ORCPT ); Wed, 23 Feb 2022 00:26:41 -0500 Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com [IPv6:2607:f8b0:4864:20::114a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2B2836E784 for ; Tue, 22 Feb 2022 21:25:08 -0800 (PST) Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-2d74a0ff060so75807097b3.6 for ; Tue, 22 Feb 2022 21:25:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=Wjvyoa2FJqkBAKFP17bCuJdeRktO4ffR+raoUv+wC04=; b=IuC+day3sN1PmeI3q0+DeqehWlmR6vmu4lqtMz2zTkSs/sLWgZjKSMjbDxjarGYK3w hUZZQKn00coi8l0R0gTq63im2iXgDqy8kS7ZoG/GfBb24X2etuAVLlEbTVnHHQVOpffA 7kJw+tCwaYdWlS1krugMDGNM7sJ3OTIDt7ytbn7/z+HK1a8ew2aSceVwKePXDv9f7WZz m0Q/zwZ6fjCsUI3xB5Psf0pqjVkAy2TXCzeSjVt+m4RXqs776AhpaQ80FhrK9Kk/bCDo Sz1kG0HnL9LnI2+RNBaFmgg3coHxCTDPj+PGc/Rc5iXJGNpAzaSB6KONLlzImp2qWQaO osLQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=Wjvyoa2FJqkBAKFP17bCuJdeRktO4ffR+raoUv+wC04=; b=4Ho6lkXTE1XssdW/JHnW1EtUKYaGKJQyTNIKeB728rnanxQNrKd2040dZmcXbgOKhU V5KCpMKV1FPIye/xIYUZpAh78NJRur9Tad9um5KPsqOuKITkBXrPmpzWnIbiovv2LEKm 8nwlqzuc/3Afifb3P6h11SVhCAKn1Z77RXhzCxBgpL2TM/l9T5nNDRQbw+ZeE+G71Er7 a1qenSnnDKKZ+pk5k3fiBG/YAKmnFmPKwFclT3cGqD4OHHuo+Vi3ZyRW+HsPHfOBF0/U ShBB5wRFvJXd294IVuLcnao+aGeaHBqecWAzpMkSIp2gMvE1BNnEUwbzU44K9aRZVg7A Wvzg== X-Gm-Message-State: AOAM531Yz3aDRvkLbroC7qW9Aa69W1MR9uPW31X0cwuzTImaSJ3HgW2O cIEV49Tjz+Z7xkReSlsWgG5IE4vrohJ94AfcXg4MjYkksJHw3f0FHAl7pr6WqEXJDGY4df3UKIM rv6zQqxlYljefYRfQYeyyUg+A6MCF9kdO+NUbsybaLdBSIw9zPQTCB+fTAOIiDmjE2PLQepna X-Google-Smtp-Source: ABdhPJyc5lR4Rjfv+Xgr2/PVDwYzQX3XdlHXcXxiuT59EhJPeYW9yxuIt2bJoEcFUWJJKvKStyvmN4Ecr+Gy X-Received: from js-desktop.svl.corp.google.com ([2620:15c:2cd:202:ccbe:5d15:e2e6:322]) (user=junaids job=sendgmr) by 2002:a0d:e4c2:0:b0:2d4:da21:cc07 with SMTP id n185-20020a0de4c2000000b002d4da21cc07mr27147139ywe.16.1645593894760; Tue, 22 Feb 2022 21:24:54 -0800 (PST) Date: Tue, 22 Feb 2022 21:22:07 -0800 In-Reply-To: <20220223052223.1202152-1-junaids@google.com> Message-Id: <20220223052223.1202152-32-junaids@google.com> Mime-Version: 1.0 References: <20220223052223.1202152-1-junaids@google.com> X-Mailer: git-send-email 2.35.1.473.g83b2b277ed-goog Subject: [RFC PATCH 31/47] mm: asi: Support for non-sensitive SLUB caches From: Junaid Shahid To: linux-kernel@vger.kernel.org Cc: kvm@vger.kernel.org, pbonzini@redhat.com, jmattson@google.com, pjt@google.com, oweisse@google.com, alexandre.chartre@oracle.com, rppt@linux.ibm.com, dave.hansen@linux.intel.com, peterz@infradead.org, tglx@linutronix.de, luto@kernel.org, linux-mm@kvack.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This adds support for allocating global and local non-sensitive objects using the SLUB allocator. Similar to SLAB, per-process child caches are created for locally non-sensitive allocations. This mechanism is based on a modified form of the earlier implementation of per-memcg caches. Signed-off-by: Junaid Shahid --- include/linux/slub_def.h | 6 ++ mm/slab.h | 5 ++ mm/slab_common.c | 33 +++++++-- mm/slub.c | 140 ++++++++++++++++++++++++++++++++++++++- security/Kconfig | 3 +- 5 files changed, 179 insertions(+), 8 deletions(-) diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h index 0fa751b946fa..6e185b61582c 100644 --- a/include/linux/slub_def.h +++ b/include/linux/slub_def.h @@ -137,6 +137,12 @@ struct kmem_cache { struct kasan_cache kasan_info; #endif =20 +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + struct kmem_local_cache_info local_cache_info; + /* For propagation, maximum size of a stored attr */ + unsigned int max_attr_size; +#endif + unsigned int useroffset; /* Usercopy region offset */ unsigned int usersize; /* Usercopy region size */ =20 diff --git a/mm/slab.h b/mm/slab.h index b9e11038be27..8799bcdd2fff 100644 --- a/mm/slab.h +++ b/mm/slab.h @@ -216,6 +216,7 @@ int __kmem_cache_shutdown(struct kmem_cache *); void __kmem_cache_release(struct kmem_cache *); int __kmem_cache_shrink(struct kmem_cache *); void slab_kmem_cache_release(struct kmem_cache *); +void kmem_cache_shrink_all(struct kmem_cache *s); =20 struct seq_file; struct file; @@ -344,6 +345,7 @@ void restore_page_nonsensitive_metadata(struct page *pa= ge, } =20 void set_nonsensitive_cache_params(struct kmem_cache *s); +void init_local_cache_info(struct kmem_cache *s, struct kmem_cache *root); =20 #else /* CONFIG_ADDRESS_SPACE_ISOLATION */ =20 @@ -380,6 +382,9 @@ static inline void restore_page_nonsensitive_metadata(s= truct page *page, =20 static inline void set_nonsensitive_cache_params(struct kmem_cache *s) { } =20 +static inline +void init_local_cache_info(struct kmem_cache *s, struct kmem_cache *root) = { } + #endif /* CONFIG_ADDRESS_SPACE_ISOLATION */ =20 #ifdef CONFIG_MEMCG_KMEM diff --git a/mm/slab_common.c b/mm/slab_common.c index b486b72d6344..efa61b97902a 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -142,7 +142,7 @@ int __kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t= flags, size_t nr, =20 LIST_HEAD(slab_root_caches); =20 -static void init_local_cache_info(struct kmem_cache *s, struct kmem_cache = *root) +void init_local_cache_info(struct kmem_cache *s, struct kmem_cache *root) { if (root) { s->local_cache_info.root_cache =3D root; @@ -194,9 +194,6 @@ void set_nonsensitive_cache_params(struct kmem_cache *s) =20 #else =20 -static inline -void init_local_cache_info(struct kmem_cache *s, struct kmem_cache *root) = { } - static inline void cleanup_local_cache_info(struct kmem_cache *s) { } =20 #endif /* CONFIG_ADDRESS_SPACE_ISOLATION */ @@ -644,6 +641,34 @@ int kmem_cache_shrink(struct kmem_cache *cachep) } EXPORT_SYMBOL(kmem_cache_shrink); =20 +/** + * kmem_cache_shrink_all - shrink a cache and all child caches for root ca= che + * @s: The cache pointer + */ +void kmem_cache_shrink_all(struct kmem_cache *s) +{ + struct kmem_cache *c; + + if (!static_asi_enabled() || !is_root_cache(s)) { + kmem_cache_shrink(s); + return; + } + + kasan_cache_shrink(s); + __kmem_cache_shrink(s); + + /* + * We have to take the slab_mutex to protect from the child cache list + * modification. + */ + mutex_lock(&slab_mutex); + for_each_child_cache(c, s) { + kasan_cache_shrink(c); + __kmem_cache_shrink(c); + } + mutex_unlock(&slab_mutex); +} + bool slab_is_available(void) { return slab_state >=3D UP; diff --git a/mm/slub.c b/mm/slub.c index abe7db581d68..df0191f8b0e2 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -289,6 +289,21 @@ static void debugfs_slab_add(struct kmem_cache *); static inline void debugfs_slab_add(struct kmem_cache *s) { } #endif =20 +#if defined(CONFIG_SYSFS) && defined(CONFIG_ADDRESS_SPACE_ISOLATION) +static void propagate_slab_attrs_from_parent(struct kmem_cache *s); +static void propagate_slab_attr_to_children(struct kmem_cache *s, + struct attribute *attr, + const char *buf, size_t len); +#else +static inline void propagate_slab_attrs_from_parent(struct kmem_cache *s) = { } + +static inline +void propagate_slab_attr_to_children(struct kmem_cache *s, + struct attribute *attr, + const char *buf, size_t len) +{ } +#endif + static inline void stat(const struct kmem_cache *s, enum stat_item si) { #ifdef CONFIG_SLUB_STATS @@ -2015,6 +2030,7 @@ static void __free_slab(struct kmem_cache *s, struct = page *page) if (current->reclaim_state) current->reclaim_state->reclaimed_slab +=3D pages; unaccount_slab_page(page, order, s); + restore_page_nonsensitive_metadata(page, s); __free_pages(page, order); } =20 @@ -4204,6 +4220,8 @@ static int kmem_cache_open(struct kmem_cache *s, slab= _flags_t flags) } } =20 + set_nonsensitive_cache_params(s); + #if defined(CONFIG_HAVE_CMPXCHG_DOUBLE) && \ defined(CONFIG_HAVE_ALIGNED_STRUCT_PAGE) if (system_has_cmpxchg_double() && (s->flags & SLAB_NO_CMPXCHG) =3D=3D 0) @@ -4797,6 +4815,10 @@ static struct kmem_cache * __init bootstrap(struct k= mem_cache *static_cache) #endif } list_add(&s->list, &slab_caches); + init_local_cache_info(s, NULL); +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + list_del(&static_cache->root_caches_node); +#endif return s; } =20 @@ -4863,7 +4885,7 @@ struct kmem_cache * __kmem_cache_alias(const char *name, unsigned int size, unsigned int align, slab_flags_t flags, void (*ctor)(void *)) { - struct kmem_cache *s; + struct kmem_cache *s, *c; =20 s =3D find_mergeable(size, align, flags, name, ctor); if (s) { @@ -4876,6 +4898,11 @@ __kmem_cache_alias(const char *name, unsigned int si= ze, unsigned int align, s->object_size =3D max(s->object_size, size); s->inuse =3D max(s->inuse, ALIGN(size, sizeof(void *))); =20 + for_each_child_cache(c, s) { + c->object_size =3D s->object_size; + c->inuse =3D max(c->inuse, ALIGN(size, sizeof(void *))); + } + if (sysfs_slab_alias(s, name)) { s->refcount--; s =3D NULL; @@ -4889,6 +4916,9 @@ int __kmem_cache_create(struct kmem_cache *s, slab_fl= ags_t flags) { int err; =20 + if (!static_asi_enabled()) + flags &=3D ~SLAB_NONSENSITIVE; + err =3D kmem_cache_open(s, flags); if (err) return err; @@ -4897,6 +4927,8 @@ int __kmem_cache_create(struct kmem_cache *s, slab_fl= ags_t flags) if (slab_state <=3D UP) return 0; =20 + propagate_slab_attrs_from_parent(s); + err =3D sysfs_slab_add(s); if (err) { __kmem_cache_release(s); @@ -5619,7 +5651,7 @@ static ssize_t shrink_store(struct kmem_cache *s, const char *buf, size_t length) { if (buf[0] =3D=3D '1') - kmem_cache_shrink(s); + kmem_cache_shrink_all(s); else return -EINVAL; return length; @@ -5829,6 +5861,87 @@ static ssize_t slab_attr_show(struct kobject *kobj, return err; } =20 +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + +static void propagate_slab_attrs_from_parent(struct kmem_cache *s) +{ + int i; + char *buffer =3D NULL; + struct kmem_cache *root_cache; + + if (is_root_cache(s)) + return; + + root_cache =3D s->local_cache_info.root_cache; + + /* + * This mean this cache had no attribute written. Therefore, no point + * in copying default values around + */ + if (!root_cache->max_attr_size) + return; + + for (i =3D 0; i < ARRAY_SIZE(slab_attrs); i++) { + char mbuf[64]; + char *buf; + struct slab_attribute *attr =3D to_slab_attr(slab_attrs[i]); + ssize_t len; + + if (!attr || !attr->store || !attr->show) + continue; + + /* + * It is really bad that we have to allocate here, so we will + * do it only as a fallback. If we actually allocate, though, + * we can just use the allocated buffer until the end. + * + * Most of the slub attributes will tend to be very small in + * size, but sysfs allows buffers up to a page, so they can + * theoretically happen. + */ + if (buffer) { + buf =3D buffer; + } else if (root_cache->max_attr_size < ARRAY_SIZE(mbuf) && + !IS_ENABLED(CONFIG_SLUB_STATS)) { + buf =3D mbuf; + } else { + buffer =3D (char *)get_zeroed_page(GFP_KERNEL); + if (WARN_ON(!buffer)) + continue; + buf =3D buffer; + } + + len =3D attr->show(root_cache, buf); + if (len > 0) + attr->store(s, buf, len); + } + + if (buffer) + free_page((unsigned long)buffer); +} + +static void propagate_slab_attr_to_children(struct kmem_cache *s, + struct attribute *attr, + const char *buf, size_t len) +{ + struct kmem_cache *c; + struct slab_attribute *attribute =3D to_slab_attr(attr); + + if (static_asi_enabled()) { + mutex_lock(&slab_mutex); + + if (s->max_attr_size < len) + s->max_attr_size =3D len; + + for_each_child_cache(c, s) + attribute->store(c, buf, len); + + mutex_unlock(&slab_mutex); + } +} + +#endif + static ssize_t slab_attr_store(struct kobject *kobj, struct attribute *attr, const char *buf, size_t len) @@ -5844,6 +5957,27 @@ static ssize_t slab_attr_store(struct kobject *kobj, return -EIO; =20 err =3D attribute->store(s, buf, len); + + /* + * This is a best effort propagation, so this function's return + * value will be determined by the parent cache only. This is + * basically because not all attributes will have a well + * defined semantics for rollbacks - most of the actions will + * have permanent effects. + * + * Returning the error value of any of the children that fail + * is not 100 % defined, in the sense that users seeing the + * error code won't be able to know anything about the state of + * the cache. + * + * Only returning the error code for the parent cache at least + * has well defined semantics. The cache being written to + * directly either failed or succeeded, in which case we loop + * through the descendants with best-effort propagation. + */ + if (slab_state >=3D FULL && err >=3D 0 && is_root_cache(s)) + propagate_slab_attr_to_children(s, attr, buf, len); + return err; } =20 @@ -5866,7 +6000,7 @@ static struct kset *slab_kset; =20 static inline struct kset *cache_kset(struct kmem_cache *s) { - return slab_kset; + return is_root_cache(s) ? slab_kset : NULL; } =20 #define ID_STR_LENGTH 64 diff --git a/security/Kconfig b/security/Kconfig index 070a948b5266..a5cfb09352b0 100644 --- a/security/Kconfig +++ b/security/Kconfig @@ -68,7 +68,8 @@ config PAGE_TABLE_ISOLATION config ADDRESS_SPACE_ISOLATION bool "Allow code to run with a reduced kernel address space" default n - depends on X86_64 && !UML && SLAB && !NEED_PER_CPU_KM + depends on X86_64 && !UML && !NEED_PER_CPU_KM + depends on SLAB || SLUB depends on !PARAVIRT depends on !MEMORY_HOTPLUG help --=20 2.35.1.473.g83b2b277ed-goog From nobody Fri Jun 26 13:44:01 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0939AC433EF for ; Wed, 23 Feb 2022 05:27:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238338AbiBWF1n (ORCPT ); Wed, 23 Feb 2022 00:27:43 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58106 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238342AbiBWF0t (ORCPT ); Wed, 23 Feb 2022 00:26:49 -0500 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7706F6D18A for ; Tue, 22 Feb 2022 21:25:09 -0800 (PST) Received: by mail-yb1-xb49.google.com with SMTP id s11-20020a255e0b000000b0062277953037so26418223ybb.21 for ; Tue, 22 Feb 2022 21:25:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=8mE7Rl7hvX81PshmRchAKXUw+5mI1RoqZfuGnjfnK/Y=; b=nQn+PHDiNT2rMQDvqUgnDcQMr757vUw1xTfeWksvd/uBs+Y7jycIPiRL7S70urPZ7R 8fkcrVfE/buziQatz1VzPrbZiKkVL88UDR47kWH9EmA5cOthED7jMA11HUYJo4xm65uX LbfLzZbh1vmkmQVGPzCZHhrvQkG2yMMs6jlVlPGs+f1MYI9eWUOQGRHF/yHWkKAQuG9+ 90/ZWnlFIBlEl9DgOrfwf+XgiBD2V3yGyekMz3rvqGQAKX6yRkMkTlYi4dCG11i6F1bh Phi+HrSt7E6OdAajjE7T2WGE+PvAmrOCFwLD3vGl5nSuUdeg1Ly1y8XU57nLBdCyfJFb Mrkw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=8mE7Rl7hvX81PshmRchAKXUw+5mI1RoqZfuGnjfnK/Y=; b=RpA4rP1G45TeX61rDnip0doVlA9/iwyLora50rY0MHWedi8v4vTBpvlKTomKUeM6pt 2U6YCKg1pSqTwyl0VJiZ4Nb81EnlqafTvHv9m+JNT8mS0t/WeQZRjFt4iM9jV7Wb6aGz zHllINcGOdwxh3W2lBw3q3eeom4AGjbTPq/+6MqJ3nKSOmCqSdFw0ABLzCPioKhD5o7P n2TkOYEN/Rw7Gkk5Ye8y/0GD5S0EM1W3zK3VhGwF0Wg9TOlvN7ZDBdtdMt9VgOp3hRPE 3WKVXR3wh/mnzfxhhRgDD2U1ZHvCi4Pg1YZzhFWAoZ84L/yewpaPwsn2c2TDzl+M3JzU anvQ== X-Gm-Message-State: AOAM530o8Xf+1sVuYv5DlqLLS/Ob9il8k+Z/MOq0Yi7Bk+e2HddclIsp 7eMhQqzemzUoG7pZCUY23rZYbgf55xz9bI5SO04K2Xgz7f1pv8S5x8qjFfuvCT4v8wGFvAOCxNk wqoMLaAa3iH93ZRf/6BgK1vngmiV15UZwBrBt2WEB+HG83VjDjKOIoQ7fJpj+fbHnRBjNDtCB X-Google-Smtp-Source: ABdhPJw219sWzgLSxca1tmB8U/7JNof+qQBV1MFnda0VeJBWRGjfZ3IqT80R/0u8HdfUK/arZaU6tbXLeppv X-Received: from js-desktop.svl.corp.google.com ([2620:15c:2cd:202:ccbe:5d15:e2e6:322]) (user=junaids job=sendgmr) by 2002:a0d:df4e:0:b0:2d0:ab1e:6055 with SMTP id i75-20020a0ddf4e000000b002d0ab1e6055mr27301388ywe.333.1645593896772; Tue, 22 Feb 2022 21:24:56 -0800 (PST) Date: Tue, 22 Feb 2022 21:22:08 -0800 In-Reply-To: <20220223052223.1202152-1-junaids@google.com> Message-Id: <20220223052223.1202152-33-junaids@google.com> Mime-Version: 1.0 References: <20220223052223.1202152-1-junaids@google.com> X-Mailer: git-send-email 2.35.1.473.g83b2b277ed-goog Subject: [RFC PATCH 32/47] x86: asi: Allocate FPU state separately when ASI is enabled. From: Junaid Shahid To: linux-kernel@vger.kernel.org Cc: kvm@vger.kernel.org, pbonzini@redhat.com, jmattson@google.com, pjt@google.com, oweisse@google.com, alexandre.chartre@oracle.com, rppt@linux.ibm.com, dave.hansen@linux.intel.com, peterz@infradead.org, tglx@linutronix.de, luto@kernel.org, linux-mm@kvack.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" We are going to be mapping the task_struct in the restricted ASI address space. However, the task_struct also contains the FPU register state embedded inside it, which can contain sensitive information. So when ASI is enabled, always allocate the FPU state from a separate slab cache to keep it out of task_struct. Signed-off-by: Junaid Shahid --- arch/x86/include/asm/fpu/api.h | 1 + arch/x86/kernel/fpu/core.c | 45 ++++++++++++++++++++++++++++++++-- arch/x86/kernel/fpu/init.c | 7 ++++-- arch/x86/kernel/fpu/internal.h | 1 + arch/x86/kernel/fpu/xstate.c | 21 +++++++++++++--- arch/x86/kernel/process.c | 7 +++++- 6 files changed, 74 insertions(+), 8 deletions(-) diff --git a/arch/x86/include/asm/fpu/api.h b/arch/x86/include/asm/fpu/api.h index c2767a6a387e..6f5ca3c2ef4a 100644 --- a/arch/x86/include/asm/fpu/api.h +++ b/arch/x86/include/asm/fpu/api.h @@ -112,6 +112,7 @@ extern void fpu__init_cpu(void); extern void fpu__init_system(struct cpuinfo_x86 *c); extern void fpu__init_check_bugs(void); extern void fpu__resume_cpu(void); +extern void fpstate_cache_init(void); =20 #ifdef CONFIG_MATH_EMULATION extern void fpstate_init_soft(struct swregs_state *soft); diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c index 8ea306b1bf8e..d7859573973d 100644 --- a/arch/x86/kernel/fpu/core.c +++ b/arch/x86/kernel/fpu/core.c @@ -59,6 +59,8 @@ static DEFINE_PER_CPU(bool, in_kernel_fpu); */ DEFINE_PER_CPU(struct fpu *, fpu_fpregs_owner_ctx); =20 +struct kmem_cache *fpstate_cachep; + static bool kernel_fpu_disabled(void) { return this_cpu_read(in_kernel_fpu); @@ -443,7 +445,9 @@ static void __fpstate_reset(struct fpstate *fpstate) void fpstate_reset(struct fpu *fpu) { /* Set the fpstate pointer to the default fpstate */ - fpu->fpstate =3D &fpu->__fpstate; + if (!cpu_feature_enabled(X86_FEATURE_ASI)) + fpu->fpstate =3D &fpu->__fpstate; + __fpstate_reset(fpu->fpstate); =20 /* Initialize the permission related info in fpu */ @@ -464,6 +468,26 @@ static inline void fpu_inherit_perms(struct fpu *dst_f= pu) } } =20 +void fpstate_cache_init(void) +{ + if (cpu_feature_enabled(X86_FEATURE_ASI)) { + size_t fpstate_size; + + /* TODO: Is the ALIGN-64 really needed? */ + fpstate_size =3D fpu_kernel_cfg.default_size + + ALIGN(offsetof(struct fpstate, regs), 64); + + fpstate_cachep =3D kmem_cache_create_usercopy( + "fpstate", + fpstate_size, + __alignof__(struct fpstate), + SLAB_PANIC | SLAB_ACCOUNT, + offsetof(struct fpstate, regs), + fpu_kernel_cfg.default_size, + NULL); + } +} + /* Clone current's FPU state on fork */ int fpu_clone(struct task_struct *dst, unsigned long clone_flags) { @@ -473,6 +497,22 @@ int fpu_clone(struct task_struct *dst, unsigned long c= lone_flags) /* The new task's FPU state cannot be valid in the hardware. */ dst_fpu->last_cpu =3D -1; =20 + if (cpu_feature_enabled(X86_FEATURE_ASI)) { + dst_fpu->fpstate =3D kmem_cache_alloc_node( + fpstate_cachep, GFP_KERNEL, + page_to_nid(virt_to_page(dst))); + if (!dst_fpu->fpstate) + return -ENOMEM; + + /* + * TODO: We may be able to skip the copy since the registers are + * restored below anyway. + */ + memcpy(dst_fpu->fpstate, src_fpu->fpstate, + fpu_kernel_cfg.default_size + + offsetof(struct fpstate, regs)); + } + fpstate_reset(dst_fpu); =20 if (!cpu_feature_enabled(X86_FEATURE_FPU)) @@ -531,7 +571,8 @@ int fpu_clone(struct task_struct *dst, unsigned long cl= one_flags) void fpu_thread_struct_whitelist(unsigned long *offset, unsigned long *siz= e) { *offset =3D offsetof(struct thread_struct, fpu.__fpstate.regs); - *size =3D fpu_kernel_cfg.default_size; + *size =3D cpu_feature_enabled(X86_FEATURE_ASI) + ? 0 : fpu_kernel_cfg.default_size; } =20 /* diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c index 621f4b6cac4a..8b722bf98135 100644 --- a/arch/x86/kernel/fpu/init.c +++ b/arch/x86/kernel/fpu/init.c @@ -161,9 +161,11 @@ static void __init fpu__init_task_struct_size(void) =20 /* * Add back the dynamically-calculated register state - * size. + * size, except when ASI is enabled, since in that case + * the FPU state is always allocated dynamically. */ - task_size +=3D fpu_kernel_cfg.default_size; + if (!cpu_feature_enabled(X86_FEATURE_ASI)) + task_size +=3D fpu_kernel_cfg.default_size; =20 /* * We dynamically size 'struct fpu', so we require that @@ -223,6 +225,7 @@ static void __init fpu__init_init_fpstate(void) */ void __init fpu__init_system(struct cpuinfo_x86 *c) { + current->thread.fpu.fpstate =3D ¤t->thread.fpu.__fpstate; fpstate_reset(¤t->thread.fpu); fpu__init_system_early_generic(c); =20 diff --git a/arch/x86/kernel/fpu/internal.h b/arch/x86/kernel/fpu/internal.h index dbdb31f55fc7..30acc7d0cb1a 100644 --- a/arch/x86/kernel/fpu/internal.h +++ b/arch/x86/kernel/fpu/internal.h @@ -3,6 +3,7 @@ #define __X86_KERNEL_FPU_INTERNAL_H =20 extern struct fpstate init_fpstate; +extern struct kmem_cache *fpstate_cachep; =20 /* CPU feature check wrappers */ static __always_inline __pure bool use_xsave(void) diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c index d28829403ed0..96d12f351f19 100644 --- a/arch/x86/kernel/fpu/xstate.c +++ b/arch/x86/kernel/fpu/xstate.c @@ -13,6 +13,7 @@ #include #include #include +#include =20 #include #include @@ -1495,8 +1496,15 @@ arch_initcall(xfd_update_static_branch) =20 void fpstate_free(struct fpu *fpu) { - if (fpu->fpstate && fpu->fpstate !=3D &fpu->__fpstate) - vfree(fpu->fpstate); + WARN_ON_ONCE(cpu_feature_enabled(X86_FEATURE_ASI) && + fpu->fpstate =3D=3D &fpu->__fpstate); + + if (fpu->fpstate && fpu->fpstate !=3D &fpu->__fpstate) { + if (fpu->fpstate->is_valloc) + vfree(fpu->fpstate); + else + kmem_cache_free(fpstate_cachep, fpu->fpstate); + } } =20 /** @@ -1574,7 +1582,14 @@ static int fpstate_realloc(u64 xfeatures, unsigned i= nt ksize, =20 fpregs_unlock(); =20 - vfree(curfps); + WARN_ON_ONCE(cpu_feature_enabled(X86_FEATURE_ASI) && !curfps); + if (curfps) { + if (curfps->is_valloc) + vfree(curfps); + else + kmem_cache_free(fpstate_cachep, curfps); + } + return 0; } =20 diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index c8d4a00a4de7..f9bd1c3415d4 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -80,6 +80,11 @@ EXPORT_PER_CPU_SYMBOL(cpu_tss_rw); DEFINE_PER_CPU(bool, __tss_limit_invalid); EXPORT_PER_CPU_SYMBOL_GPL(__tss_limit_invalid); =20 +void __init arch_task_cache_init(void) +{ + fpstate_cache_init(); +} + /* * this gets called so that we can store lazy state into memory and copy t= he * current task into the new thread. @@ -101,7 +106,7 @@ int arch_dup_task_struct(struct task_struct *dst, struc= t task_struct *src) #ifdef CONFIG_X86_64 void arch_release_task_struct(struct task_struct *tsk) { - if (fpu_state_size_dynamic()) + if (fpu_state_size_dynamic() || cpu_feature_enabled(X86_FEATURE_ASI)) fpstate_free(&tsk->thread.fpu); } #endif --=20 2.35.1.473.g83b2b277ed-goog From nobody Fri Jun 26 13:44:01 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9BF25C433EF for ; Wed, 23 Feb 2022 05:28:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238454AbiBWF1v (ORCPT ); Wed, 23 Feb 2022 00:27:51 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58132 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238260AbiBWF0t (ORCPT ); Wed, 23 Feb 2022 00:26:49 -0500 Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com [IPv6:2607:f8b0:4864:20::114a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0997D6C1F5 for ; Tue, 22 Feb 2022 21:25:10 -0800 (PST) Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-2d61f6c1877so163275827b3.15 for ; Tue, 22 Feb 2022 21:25:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=vfUNGUzwKPwc6P8NLNU+G63bl3V3MlYghQJcmtOSBSE=; b=C/GDnUFZL4LltMsvS5FMOZ/XDx/Be18sR4fVDAdJWkBYRmDcKo9NUb1We9xS3LmtRX yqMgXlP8N7xZF8Ux9+U55HvDcm/rY0cp41ABtNZNYkR8wwwnlp5Tc/Z+WHg/tqCClya8 drGn7Okq0DGr0wllM266iYWMJ6XzagvRVWp/SuheBkEQDwBl5n0u5tK2RWK5glZ9ZfVv eEvwT9dopjeCYjYsqzOL8esps1cNs0pAqvlGBd0CeJIX0vdenIhUxI0GrChVSsd4iQJA j46UlOr9zWrPPZy9HcKzVHRGC9BZ83U6lLLZDxYZVj1AeRVrqyWEuBaLgII5jGj/PpWl RKzQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=vfUNGUzwKPwc6P8NLNU+G63bl3V3MlYghQJcmtOSBSE=; b=8KzVHyj4xBfFiv+9IgiJDT3cNVdk5PMM6db43zeZTyqKA8fZQqpkbgImmD7SecphW+ 56Qt1kvBIvFdWJ6aNAwl1gCFhT/sOVIfVLROrry/qrTY1of0Ry7S/4b6Uja7x25yik9I y6JMNlKnYZBT6HZMEx/Tn22a1cDY409EAeMi7L9R8kEDwLxEJShmMMg1Nw38oVEq5oCo mHtvcVdu8tb2kL2FxUBORSbpqndo21/XgaKoUC9empgWvJPnKCkVIkkvVH5SyPaAXgZr act8HiCBwvtTgNjXCcnryleNAXX9wA1eojxdNsT4HXRCeIFeVw3U9BbHvd7CCAOrXAFw sa+A== X-Gm-Message-State: AOAM532ZUXAEOirmor61NAk6MxhR1B7VJP/4UFvRHMCV+W259hA78BPU BTudGRKNbPe7sazb5HUPWeWIAhbsNZppJ+h6VePgeOwx9LX1q3aB7zwWgz2+8hkHf/tqLaETQJ0 63dWfoAl3cbCd4eDkl/qAXaPiFLM+wpOaL5661HY6EUKe/l+oaMMtFnhdWuD/krB1O4/csFQU X-Google-Smtp-Source: ABdhPJzDIIPPMetTAHkgCVvjLqBOhwyQ9fOAQZEm98thqfQ3nuARy106JiYU5crLZ1jrdqAZ9ScyawbA4X0V X-Received: from js-desktop.svl.corp.google.com ([2620:15c:2cd:202:ccbe:5d15:e2e6:322]) (user=junaids job=sendgmr) by 2002:a81:945:0:b0:2ca:287c:6cf3 with SMTP id 66-20020a810945000000b002ca287c6cf3mr26007076ywj.408.1645593899054; Tue, 22 Feb 2022 21:24:59 -0800 (PST) Date: Tue, 22 Feb 2022 21:22:09 -0800 In-Reply-To: <20220223052223.1202152-1-junaids@google.com> Message-Id: <20220223052223.1202152-34-junaids@google.com> Mime-Version: 1.0 References: <20220223052223.1202152-1-junaids@google.com> X-Mailer: git-send-email 2.35.1.473.g83b2b277ed-goog Subject: [RFC PATCH 33/47] kvm: asi: Map guest memory into restricted ASI address space From: Junaid Shahid To: linux-kernel@vger.kernel.org Cc: kvm@vger.kernel.org, pbonzini@redhat.com, jmattson@google.com, pjt@google.com, oweisse@google.com, alexandre.chartre@oracle.com, rppt@linux.ibm.com, dave.hansen@linux.intel.com, peterz@infradead.org, tglx@linutronix.de, luto@kernel.org, linux-mm@kvack.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" A module parameter treat_all_userspace_as_nonsensitive is added, which if set, maps the entire userspace of the process running the VM into the ASI restricted address space. If the flag is not set (the default), then just the userspace memory mapped into the VM's address space is mapped into the ASI restricted address space. Signed-off-by: Junaid Shahid --- arch/x86/include/asm/kvm_host.h | 2 ++ arch/x86/kvm/mmu.h | 6 ++++ arch/x86/kvm/mmu/mmu.c | 54 +++++++++++++++++++++++++++++++++ arch/x86/kvm/mmu/paging_tmpl.h | 14 +++++++++ arch/x86/kvm/x86.c | 19 +++++++++++- include/linux/kvm_host.h | 3 ++ virt/kvm/kvm_main.c | 7 +++++ 7 files changed, 104 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index 98cbd6447e3e..e63a2f244d7b 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -681,6 +681,8 @@ struct kvm_vcpu_arch { struct kvm_mmu_memory_cache mmu_gfn_array_cache; struct kvm_mmu_memory_cache mmu_page_header_cache; =20 + struct asi_pgtbl_pool asi_pgtbl_pool; + /* * QEMU userspace and the guest each have their own FPU state. * In vcpu_run, we switch between the user and guest FPU contexts. diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h index 9ae6168d381e..60b84331007d 100644 --- a/arch/x86/kvm/mmu.h +++ b/arch/x86/kvm/mmu.h @@ -49,6 +49,12 @@ =20 #define KVM_MMU_CR0_ROLE_BITS (X86_CR0_PG | X86_CR0_WP) =20 +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION +extern bool treat_all_userspace_as_nonsensitive; +#else +#define treat_all_userspace_as_nonsensitive true +#endif + static __always_inline u64 rsvd_bits(int s, int e) { BUILD_BUG_ON(__builtin_constant_p(e) && __builtin_constant_p(s) && e < s); diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index fcdf3f8bb59a..485c0ba3ce8b 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -91,6 +91,11 @@ __MODULE_PARM_TYPE(nx_huge_pages_recovery_period_ms, "ui= nt"); static bool __read_mostly force_flush_and_sync_on_reuse; module_param_named(flush_on_reuse, force_flush_and_sync_on_reuse, bool, 06= 44); =20 +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION +bool __ro_after_init treat_all_userspace_as_nonsensitive; +module_param(treat_all_userspace_as_nonsensitive, bool, 0444); +#endif + /* * When setting this variable to true it enables Two-Dimensional-Paging * where the hardware walks 2 page tables: @@ -2757,6 +2762,21 @@ static int mmu_set_spte(struct kvm_vcpu *vcpu, struc= t kvm_memory_slot *slot, return ret; } =20 +static void asi_map_gfn_range(struct kvm_vcpu *vcpu, + struct kvm_memory_slot *slot, + gfn_t gfn, size_t npages) +{ + int err; + size_t hva =3D __gfn_to_hva_memslot(slot, gfn); + + err =3D asi_map_user(vcpu->kvm->asi, (void *)hva, PAGE_SIZE * npages, + &vcpu->arch.asi_pgtbl_pool, slot->userspace_addr, + slot->userspace_addr + slot->npages * PAGE_SIZE); + if (err) + kvm_err("asi_map_user for %lx-%lx failed with code %d", hva, + hva + PAGE_SIZE * npages, err); +} + static int direct_pte_prefetch_many(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, u64 *start, u64 *end) @@ -2776,6 +2796,9 @@ static int direct_pte_prefetch_many(struct kvm_vcpu *= vcpu, if (ret <=3D 0) return -1; =20 + if (!treat_all_userspace_as_nonsensitive) + asi_map_gfn_range(vcpu, slot, gfn, ret); + for (i =3D 0; i < ret; i++, gfn++, start++) { mmu_set_spte(vcpu, slot, start, access, gfn, page_to_pfn(pages[i]), NULL); @@ -3980,6 +4003,15 @@ static bool kvm_faultin_pfn(struct kvm_vcpu *vcpu, s= truct kvm_page_fault *fault, return true; } =20 +static void vcpu_fill_asi_pgtbl_pool(struct kvm_vcpu *vcpu) +{ + int err =3D asi_fill_pgtbl_pool(&vcpu->arch.asi_pgtbl_pool, + CONFIG_PGTABLE_LEVELS - 1, GFP_KERNEL); + + if (err) + kvm_err("asi_fill_pgtbl_pool failed with code %d", err); +} + /* * Returns true if the page fault is stale and needs to be retried, i.e. i= f the * root was invalidated by a memslot update or a relevant mmu_notifier fir= ed. @@ -4013,6 +4045,7 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, s= truct kvm_page_fault *fault bool is_tdp_mmu_fault =3D is_tdp_mmu(vcpu->arch.mmu); =20 unsigned long mmu_seq; + bool try_asi_map; int r; =20 fault->gfn =3D fault->addr >> PAGE_SHIFT; @@ -4038,6 +4071,12 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, = struct kvm_page_fault *fault if (handle_abnormal_pfn(vcpu, fault, ACC_ALL, &r)) return r; =20 + try_asi_map =3D !treat_all_userspace_as_nonsensitive && + !is_noslot_pfn(fault->pfn); + + if (try_asi_map) + vcpu_fill_asi_pgtbl_pool(vcpu); + r =3D RET_PF_RETRY; =20 if (is_tdp_mmu_fault) @@ -4052,6 +4091,9 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, s= truct kvm_page_fault *fault if (r) goto out_unlock; =20 + if (try_asi_map) + asi_map_gfn_range(vcpu, fault->slot, fault->gfn, 1); + if (is_tdp_mmu_fault) r =3D kvm_tdp_mmu_map(vcpu, fault); else @@ -5584,6 +5626,8 @@ int kvm_mmu_create(struct kvm_vcpu *vcpu) =20 vcpu->arch.nested_mmu.translate_gpa =3D translate_nested_gpa; =20 + asi_init_pgtbl_pool(&vcpu->arch.asi_pgtbl_pool); + ret =3D __kvm_mmu_create(vcpu, &vcpu->arch.guest_mmu); if (ret) return ret; @@ -5713,6 +5757,15 @@ static void kvm_mmu_invalidate_zap_pages_in_memslot(= struct kvm *kvm, struct kvm_memory_slot *slot, struct kvm_page_track_notifier_node *node) { + /* + * Currently, we just zap the entire address range, instead of only the + * memslot. So we also just asi_unmap the entire userspace. But in the + * future, if we zap only the range belonging to the memslot, then we + * should also asi_unmap only that range. + */ + if (!treat_all_userspace_as_nonsensitive) + asi_unmap_user(kvm->asi, 0, TASK_SIZE_MAX); + kvm_mmu_zap_all_fast(kvm); } =20 @@ -6194,6 +6247,7 @@ void kvm_mmu_destroy(struct kvm_vcpu *vcpu) free_mmu_pages(&vcpu->arch.root_mmu); free_mmu_pages(&vcpu->arch.guest_mmu); mmu_free_memory_caches(vcpu); + asi_clear_pgtbl_pool(&vcpu->arch.asi_pgtbl_pool); } =20 void kvm_mmu_module_exit(void) diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h index 708a5d297fe1..193317ad60a4 100644 --- a/arch/x86/kvm/mmu/paging_tmpl.h +++ b/arch/x86/kvm/mmu/paging_tmpl.h @@ -584,6 +584,9 @@ FNAME(prefetch_gpte)(struct kvm_vcpu *vcpu, struct kvm_= mmu_page *sp, if (is_error_pfn(pfn)) return false; =20 + if (!treat_all_userspace_as_nonsensitive) + asi_map_gfn_range(vcpu, slot, gfn, 1); + mmu_set_spte(vcpu, slot, spte, pte_access, gfn, pfn, NULL); kvm_release_pfn_clean(pfn); return true; @@ -836,6 +839,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, str= uct kvm_page_fault *fault int r; unsigned long mmu_seq; bool is_self_change_mapping; + bool try_asi_map; =20 pgprintk("%s: addr %lx err %x\n", __func__, fault->addr, fault->error_cod= e); WARN_ON_ONCE(fault->is_tdp); @@ -890,6 +894,12 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, st= ruct kvm_page_fault *fault if (handle_abnormal_pfn(vcpu, fault, walker.pte_access, &r)) return r; =20 + try_asi_map =3D !treat_all_userspace_as_nonsensitive && + !is_noslot_pfn(fault->pfn); + + if (try_asi_map) + vcpu_fill_asi_pgtbl_pool(vcpu); + /* * Do not change pte_access if the pfn is a mmio page, otherwise * we will cache the incorrect access into mmio spte. @@ -919,6 +929,10 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, st= ruct kvm_page_fault *fault r =3D make_mmu_pages_available(vcpu); if (r) goto out_unlock; + + if (try_asi_map) + asi_map_gfn_range(vcpu, fault->slot, walker.gfn, 1); + r =3D FNAME(fetch)(vcpu, fault, &walker); kvm_mmu_audit(vcpu, AUDIT_POST_PAGE_FAULT); =20 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index dd07f677d084..d0df14deae80 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -8722,7 +8722,10 @@ int kvm_arch_init(void *opaque) goto out_free_percpu; =20 if (ops->runtime_ops->flush_sensitive_cpu_state) { - r =3D asi_register_class("KVM", ASI_MAP_STANDARD_NONSENSITIVE, + r =3D asi_register_class("KVM", + ASI_MAP_STANDARD_NONSENSITIVE | + (treat_all_userspace_as_nonsensitive ? + ASI_MAP_ALL_USERSPACE : 0), &kvm_asi_hooks); if (r < 0) goto out_mmu_exit; @@ -9675,6 +9678,17 @@ void kvm_arch_mmu_notifier_invalidate_range(struct k= vm *kvm, apic_address =3D gfn_to_hva(kvm, APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT); if (start <=3D apic_address && apic_address < end) kvm_make_all_cpus_request(kvm, KVM_REQ_APIC_PAGE_RELOAD); + + if (!treat_all_userspace_as_nonsensitive) + asi_unmap_user(kvm->asi, (void *)start, end - start); +} + +void kvm_arch_mmu_notifier_invalidate_range_start(struct kvm *kvm, + unsigned long start, + unsigned long end) +{ + if (!treat_all_userspace_as_nonsensitive) + asi_unmap_user(kvm->asi, (void *)start, end - start); } =20 void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu) @@ -11874,6 +11888,9 @@ void kvm_arch_commit_memory_region(struct kvm *kvm, =20 void kvm_arch_flush_shadow_all(struct kvm *kvm) { + if (!treat_all_userspace_as_nonsensitive) + asi_unmap_user(kvm->asi, 0, TASK_SIZE_MAX); + kvm_mmu_zap_all(kvm); } =20 diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 9dd63ed21f75..f31f7442eced 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -1819,6 +1819,9 @@ static inline long kvm_arch_vcpu_async_ioctl(struct f= ile *filp, =20 void kvm_arch_mmu_notifier_invalidate_range(struct kvm *kvm, unsigned long start, unsigned long end); +void kvm_arch_mmu_notifier_invalidate_range_start(struct kvm *kvm, + unsigned long start, + unsigned long end); =20 #ifdef CONFIG_HAVE_KVM_VCPU_RUN_PID_CHANGE int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu); diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 72c4e6b39389..e8e9c8588908 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -162,6 +162,12 @@ __weak void kvm_arch_mmu_notifier_invalidate_range(str= uct kvm *kvm, { } =20 +__weak void kvm_arch_mmu_notifier_invalidate_range_start(struct kvm *kvm, + unsigned long start, + unsigned long end) +{ +} + bool kvm_is_zone_device_pfn(kvm_pfn_t pfn) { /* @@ -685,6 +691,7 @@ static int kvm_mmu_notifier_invalidate_range_start(stru= ct mmu_notifier *mn, spin_unlock(&kvm->mn_invalidate_lock); =20 __kvm_handle_hva_range(kvm, &hva_range); + kvm_arch_mmu_notifier_invalidate_range_start(kvm, range->start, range->en= d); =20 return 0; } --=20 2.35.1.473.g83b2b277ed-goog From nobody Fri Jun 26 13:44:01 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 61568C433F5 for ; Wed, 23 Feb 2022 05:28:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235815AbiBWF2v (ORCPT ); Wed, 23 Feb 2022 00:28:51 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57742 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238380AbiBWF0u (ORCPT ); Wed, 23 Feb 2022 00:26:50 -0500 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C2EA76A075 for ; Tue, 22 Feb 2022 21:25:12 -0800 (PST) Received: by mail-yb1-xb49.google.com with SMTP id a12-20020a056902056c00b0061dc0f2a94aso26532942ybt.6 for ; Tue, 22 Feb 2022 21:25:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=ADygLqIPpXiMqBEcZG+FIIc+2sjjDL7sC0SphzjVQM8=; b=XTSLmsGvhvcUg1TWPppXBzPUQLRac82+QtTaWAJbfENnf+LFsbfGfYAYKSS2y7fP86 xw06QgEtf7pep5PqE6Q1aM+NyxahQ7SmgpjVkx80n1ORA4q/bqa1CVOsjpTMMWmZloKQ uucN4SETymYCsva4W1OHKwzmwTJIk9TZAOYPJ1y+ZPVUmyQQ1PbH7n1l3q9WUPnCWwcM Cb3CxmDXmSaVEz0TvEtPSskpBLRm0AocmFGmy3Znq/tmFUH/ahr88pzMeI+oucZvgSN3 VZ4LlqahzSCUtRhfh8h6InSjlirWGXuPDeznUvG9vYUdMzAnIxg/DQM0QaE6Bs8x3s0G p5Tg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=ADygLqIPpXiMqBEcZG+FIIc+2sjjDL7sC0SphzjVQM8=; b=A12OxfiFRgz+hVeEtz8r1nWN37ez2t5hSJgXMEfHRMXQAvWA89d4bbnYC1HRWR3NCa 3pubBh3ZhcV82u9VtJxbZc1L/iqA0dXDgddEMVlh07jzZhcgSi+VTYIJk4lgKVr72lJ+ qa1u4hD1TPFffxOmDzuuEgHNJEvc2U7Hbh5xZj0o8sOkI9yVnBMg5Jn3ufSQ5vOWwxRp n7QqjuwoKZMe6PnvptftcgLdLE/su71oKRSXoRRtpChZC7rvuoqTSi21naHAfHN2zxYm KLeD71P6DDRmONyZB1hnOukt3U5Gs5KtbCY1SBwM02JCV9mUBlW/hWHMGCQZyn+IUxC+ rzIQ== X-Gm-Message-State: AOAM532hwSiIPJKdkWSn2/stglxgICqJf0w1SmyZJDBJq14om/3F+aSd hDmmh3E++0SqprliSze4j3mgrPZwXi87txeSPpyEjbva8g1sPn0jwh/+ZkjllnG/d0PXy4+hv6T pZzezFpTiAz6TWWsiqIIjRl3+XW+nH5WqAvEsYSoljvSuGOVh6ccsoJ7c1tE7Np452Ff/Lf5W X-Google-Smtp-Source: ABdhPJxnEaZg6RVeib/qxGX0ultucEuvlDuM1/3KZuHHbAgoAGzQNWHho88GWzU0EQgaqkKLhw8qsZUiJNi/ X-Received: from js-desktop.svl.corp.google.com ([2620:15c:2cd:202:ccbe:5d15:e2e6:322]) (user=junaids job=sendgmr) by 2002:a81:354f:0:b0:2d0:e91f:c26 with SMTP id c76-20020a81354f000000b002d0e91f0c26mr27033178ywa.318.1645593901360; Tue, 22 Feb 2022 21:25:01 -0800 (PST) Date: Tue, 22 Feb 2022 21:22:10 -0800 In-Reply-To: <20220223052223.1202152-1-junaids@google.com> Message-Id: <20220223052223.1202152-35-junaids@google.com> Mime-Version: 1.0 References: <20220223052223.1202152-1-junaids@google.com> X-Mailer: git-send-email 2.35.1.473.g83b2b277ed-goog Subject: [RFC PATCH 34/47] kvm: asi: Unmap guest memory from ASI address space when using nested virt From: Junaid Shahid To: linux-kernel@vger.kernel.org Cc: kvm@vger.kernel.org, pbonzini@redhat.com, jmattson@google.com, pjt@google.com, oweisse@google.com, alexandre.chartre@oracle.com, rppt@linux.ibm.com, dave.hansen@linux.intel.com, peterz@infradead.org, tglx@linutronix.de, luto@kernel.org, linux-mm@kvack.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" L1 guest memory as a whole cannot be considered non-sensitive when an L2 is running. Even if L1 is using its own mitigations, L2 VM Exits could, in theory, bring into the cache some sensitive L1 memory without L1 getting a chance to flush it. For simplicity, we just unmap the entire L1 memory from the ASI restricted address space when nested virtualization is turned on. Though this is overridden if the treat_all_userspace_as_nonsensitive flag is enabled. In the future, we could potentially map some portions of L1 memory which are known to contain non-sensitive memory, which would reduce ASI overhead during nested virtualization. Note that unmapping the guest memory still leaves a slight hole because L2 could also potentially access copies of L1 VCPU registers stored in L0 kernel structures. In the future, this could be mitigated by having a separate ASI address space for each VCPU and treating the associated structures as locally non-sensitive only within that VCPU's ASI address space. Signed-off-by: Junaid Shahid --- arch/x86/include/asm/kvm_host.h | 6 ++++++ arch/x86/kvm/mmu/mmu.c | 10 ++++++++++ arch/x86/kvm/vmx/nested.c | 22 ++++++++++++++++++++++ 3 files changed, 38 insertions(+) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index e63a2f244d7b..8ba88bbcf895 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1200,6 +1200,12 @@ struct kvm_arch { */ struct list_head tdp_mmu_pages; =20 + /* + * Number of VCPUs that have enabled nested virtualization. + * Currently only maintained when ASI is enabled. + */ + int nested_virt_enabled_count; + /* * Protects accesses to the following fields when the MMU lock * is held in read mode: diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 485c0ba3ce8b..5785a0d02558 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -94,6 +94,7 @@ module_param_named(flush_on_reuse, force_flush_and_sync_o= n_reuse, bool, 0644); #ifdef CONFIG_ADDRESS_SPACE_ISOLATION bool __ro_after_init treat_all_userspace_as_nonsensitive; module_param(treat_all_userspace_as_nonsensitive, bool, 0444); +EXPORT_SYMBOL_GPL(treat_all_userspace_as_nonsensitive); #endif =20 /* @@ -2769,6 +2770,15 @@ static void asi_map_gfn_range(struct kvm_vcpu *vcpu, int err; size_t hva =3D __gfn_to_hva_memslot(slot, gfn); =20 + /* + * For now, we just don't map any guest memory when using nested + * virtualization. In the future, we could potentially map some + * portions of guest memory which are known to contain only memory + * which would be considered non-sensitive. + */ + if (vcpu->kvm->arch.nested_virt_enabled_count) + return; + err =3D asi_map_user(vcpu->kvm->asi, (void *)hva, PAGE_SIZE * npages, &vcpu->arch.asi_pgtbl_pool, slot->userspace_addr, slot->userspace_addr + slot->npages * PAGE_SIZE); diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c index 9c941535f78c..0a0092e4102d 100644 --- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -318,6 +318,14 @@ static void free_nested(struct kvm_vcpu *vcpu) nested_release_evmcs(vcpu); =20 free_loaded_vmcs(&vmx->nested.vmcs02); + + if (cpu_feature_enabled(X86_FEATURE_ASI) && + !treat_all_userspace_as_nonsensitive) { + write_lock(&vcpu->kvm->mmu_lock); + WARN_ON(vcpu->kvm->arch.nested_virt_enabled_count <=3D 0); + vcpu->kvm->arch.nested_virt_enabled_count--; + write_unlock(&vcpu->kvm->mmu_lock); + } } =20 /* @@ -4876,6 +4884,20 @@ static int enter_vmx_operation(struct kvm_vcpu *vcpu) pt_update_intercept_for_msr(vcpu); } =20 + if (cpu_feature_enabled(X86_FEATURE_ASI) && + !treat_all_userspace_as_nonsensitive) { + /* + * We do the increment under the MMU lock in order to prevent + * it from happening concurrently with asi_map_gfn_range(). + */ + write_lock(&vcpu->kvm->mmu_lock); + WARN_ON(vcpu->kvm->arch.nested_virt_enabled_count < 0); + vcpu->kvm->arch.nested_virt_enabled_count++; + write_unlock(&vcpu->kvm->mmu_lock); + + asi_unmap_user(vcpu->kvm->asi, 0, TASK_SIZE_MAX); + } + return 0; =20 out_shadow_vmcs: --=20 2.35.1.473.g83b2b277ed-goog From nobody Fri Jun 26 13:44:01 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B1E9CC4167E for ; Wed, 23 Feb 2022 05:28:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238706AbiBWF2O (ORCPT ); Wed, 23 Feb 2022 00:28:14 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58776 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238389AbiBWF0v (ORCPT ); Wed, 23 Feb 2022 00:26:51 -0500 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EC6006AA44 for ; Tue, 22 Feb 2022 21:25:14 -0800 (PST) Received: by mail-yb1-xb4a.google.com with SMTP id g205-20020a2552d6000000b0061e1843b8edso26635752ybb.18 for ; Tue, 22 Feb 2022 21:25:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=z2A7zhGCAXM89Mm8qf+swCLr5Dn/epWs5SXVvIpUsek=; b=tRhCbGDitng8H5bHVni9LeLhw+OutL5exoDLV2q/On1vnMgXPuEEjcg+LbKI1cVqrD Vyo7V47FL7/0MDpSopFI7nReIbg7le3SY0oaOVkKdiMGKdDx7pdDSJXZaaKKRg410xu2 t4RPD/m0Rz3825891oGpbAo2zhoskX0LmYLk/s6S6UydDxkiw0PDjI+nunjVplgqDwrA +5Jz1StT9q9LpFCspfDpRycEYKD4vtMQDfcSS06Robo3QGT5pJP4suYnnRPaHrjD0ty7 l5qUqZO6zEun+3vP4Gykt/1OOPF+J93aOR22CevdnVkqI4Do06KPWQJ6jdbyHleq2VP2 sP7g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=z2A7zhGCAXM89Mm8qf+swCLr5Dn/epWs5SXVvIpUsek=; b=kXc4MM+FP9iLh+AtoIdKWczCPVElLVZ3vZGqeGJy95HHm4a3BD4pw99I7twQhPbdQZ eKh5HJFbjli9CeV/ywaXiYq3NEzyI4IdA2m1TBRsKPJBpsltVV/2EOEOYfAhm7XsyBnY 1PtYjieCKrBpfoGF+2wFBBBi6u48MW23IV5RtvIevQZOOhpM1ORQbGEE3RYakTPSPzu0 EgSgNMeXIye/gOK7Pqf+Y6VSL9//nstYXve3EeA+V/YpGf8GqsPiZ/1wW7E+b3XCYnH2 izojfUD6Y7abyRlBi9vBrWH2Q/XX6Vssx7Wnv8fPxvAzH4ds6uwT147HZGiNfaYoPl5Q j6Fw== X-Gm-Message-State: AOAM533pBLMtZPO3vmUnh3fLIs/GA3X/PkM+WrbWACC9ezzFgabOTa09 Gz0Rgfs4c/WiaH52XJhSqf8Mb6HvTtSVMuctiI0sh6qTuqqv3S6vH7RNdNOej20NCSKHwLLBhqY uuBtGjQ2ZMV4SYKCs0sUhzmTrvQNxK4jMp0z7MH4Hzv0RPX84z0wSEZ/e91/m0IwuLbZXX5lz X-Google-Smtp-Source: ABdhPJxQh2hzOmUCmk3wlhtqctWGzojDMn80W5x6yZGJyfVoEE9kdvLHRV4rxoztBsJ+fy5+R2LvtwTwKQ0F X-Received: from js-desktop.svl.corp.google.com ([2620:15c:2cd:202:ccbe:5d15:e2e6:322]) (user=junaids job=sendgmr) by 2002:a0d:eb09:0:b0:2d1:e0df:5104 with SMTP id u9-20020a0deb09000000b002d1e0df5104mr27669944ywe.250.1645593903681; Tue, 22 Feb 2022 21:25:03 -0800 (PST) Date: Tue, 22 Feb 2022 21:22:11 -0800 In-Reply-To: <20220223052223.1202152-1-junaids@google.com> Message-Id: <20220223052223.1202152-36-junaids@google.com> Mime-Version: 1.0 References: <20220223052223.1202152-1-junaids@google.com> X-Mailer: git-send-email 2.35.1.473.g83b2b277ed-goog Subject: [RFC PATCH 35/47] mm: asi: asi_exit() on PF, skip handling if address is accessible From: Junaid Shahid To: linux-kernel@vger.kernel.org Cc: Ofir Weisse , kvm@vger.kernel.org, pbonzini@redhat.com, jmattson@google.com, pjt@google.com, alexandre.chartre@oracle.com, rppt@linux.ibm.com, dave.hansen@linux.intel.com, peterz@infradead.org, tglx@linutronix.de, luto@kernel.org, linux-mm@kvack.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Ofir Weisse On a page-fault - do asi_exit(). Then check if now after the exit the address is accessible. We do this by refactoring spurious_kernel_fault() into two parts: 1. Verify that the error code value is something that could arise from a lazy TLB update. 2. Walk the page table and verify permissions, which is now called is_address_accessible_now(). We also define PTE_PRESENT() and PMD_PRESENT() which are suitable for checking userspace pages. For the sake of spurious faualts, pte_present() and pmd_present() are only good for kernelspace pages. This is because these macros might return true even if the present bit is 0 (only relevant for userspace). Signed-off-by: Ofir Weisse --- arch/x86/mm/fault.c | 60 ++++++++++++++++++++++++++++++++++------ include/linux/mm_types.h | 3 ++ 2 files changed, 55 insertions(+), 8 deletions(-) diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index 8692eb50f4a5..d08021ba380b 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -982,6 +982,8 @@ static int spurious_kernel_fault_check(unsigned long er= ror_code, pte_t *pte) return 1; } =20 +static int is_address_accessible_now(unsigned long error_code, unsigned lo= ng address, + pgd_t *pgd); /* * Handle a spurious fault caused by a stale TLB entry. * @@ -1003,15 +1005,13 @@ static int spurious_kernel_fault_check(unsigned lon= g error_code, pte_t *pte) * See Intel Developer's Manual Vol 3 Section 4.10.4.3, bullet 3 * (Optional Invalidation). */ +/* A spurious fault is also possible when Address Space Isolation (ASI) is= in + * use. Specifically, code running withing an ASI domain touched memory ou= tside + * the domain. This access causes a page-fault --> asi_exit() */ static noinline int spurious_kernel_fault(unsigned long error_code, unsigned long address) { pgd_t *pgd; - p4d_t *p4d; - pud_t *pud; - pmd_t *pmd; - pte_t *pte; - int ret; =20 /* * Only writes to RO or instruction fetches from NX may cause @@ -1027,6 +1027,37 @@ spurious_kernel_fault(unsigned long error_code, unsi= gned long address) return 0; =20 pgd =3D init_mm.pgd + pgd_index(address); + return is_address_accessible_now(error_code, address, pgd); +} +NOKPROBE_SYMBOL(spurious_kernel_fault); + + +/* Check if an address (kernel or userspace) would cause a page fault if + * accessed now. + * + * For kernel addresses, pte_present and pmd_present are sufficioent. For + * userspace, we must use PTE_PRESENT and PMD_PRESENT, which will only che= ck the + * present bits. + * The existing pmd_present() in arch/x86/include/asm/pgtable.h is mislead= ing. + * The PMD page might be in the middle of split_huge_page with present bit + * clear, but pmd_present will still return true. We are inteerested in kn= owing + * if the page is accessible to hardware - that is - the present bit is 1.= */ +#define PMD_PRESENT(pmd) (pmd_flags(pmd) & _PAGE_PRESENT) + +/* pte_present will return true is _PAGE_PROTNONE is 1. We care if the har= dware + * can actually access the page right now. */ +#define PTE_PRESENT(pte) (pte_flags(pte) & _PAGE_PRESENT) + +static noinline int +is_address_accessible_now(unsigned long error_code, unsigned long address, + pgd_t *pgd) +{ + p4d_t *p4d; + pud_t *pud; + pmd_t *pmd; + pte_t *pte; + int ret; + if (!pgd_present(*pgd)) return 0; =20 @@ -1045,14 +1076,14 @@ spurious_kernel_fault(unsigned long error_code, uns= igned long address) return spurious_kernel_fault_check(error_code, (pte_t *) pud); =20 pmd =3D pmd_offset(pud, address); - if (!pmd_present(*pmd)) + if (!PMD_PRESENT(*pmd)) return 0; =20 if (pmd_large(*pmd)) return spurious_kernel_fault_check(error_code, (pte_t *) pmd); =20 pte =3D pte_offset_kernel(pmd, address); - if (!pte_present(*pte)) + if (!PTE_PRESENT(*pte)) return 0; =20 ret =3D spurious_kernel_fault_check(error_code, pte); @@ -1068,7 +1099,6 @@ spurious_kernel_fault(unsigned long error_code, unsig= ned long address) =20 return ret; } -NOKPROBE_SYMBOL(spurious_kernel_fault); =20 int show_unhandled_signals =3D 1; =20 @@ -1504,6 +1534,20 @@ DEFINE_IDTENTRY_RAW_ERRORCODE(exc_page_fault) * the fixup on the next page fault. */ struct asi *asi =3D asi_get_current(); + if (asi) + asi_exit(); + + /* handle_page_fault() might call BUG() if we run it for a kernel + * address. This might be the case if we got here due to an ASI fa= ult. + * We avoid this case by checking whether the address is now, afte= r a + * potential asi_exit(), accessible by hardware. If it is - there's + * nothing to do. + */ + if (current && mm_asi_enabled(current->mm)) { + pgd_t *pgd =3D (pgd_t*)__va(read_cr3_pa()) + pgd_index(add= ress); + if (is_address_accessible_now(error_code, address, pgd)) + return; + } =20 prefetchw(¤t->mm->mmap_lock); =20 diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index c3f209720a84..560909e80841 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -707,6 +707,9 @@ extern struct mm_struct init_mm; #ifdef CONFIG_ADDRESS_SPACE_ISOLATION static inline bool mm_asi_enabled(struct mm_struct *mm) { + if (!mm) + return false; + return mm->asi_enabled; } #else --=20 2.35.1.473.g83b2b277ed-goog From nobody Fri Jun 26 13:44:01 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A10F3C4321E for ; Wed, 23 Feb 2022 05:28:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238652AbiBWF2L (ORCPT ); Wed, 23 Feb 2022 00:28:11 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57700 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238400AbiBWF0w (ORCPT ); Wed, 23 Feb 2022 00:26:52 -0500 Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com [IPv6:2607:f8b0:4864:20::114a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C26386A033 for ; Tue, 22 Feb 2022 21:25:16 -0800 (PST) Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-2d6180e0ab4so162345607b3.2 for ; Tue, 22 Feb 2022 21:25:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=Bg8ARgAvpxqrVZujUdMBpF3O5GREYgeeic1zpLXlEAs=; b=ZkYcibzizevz7miAILMO787yJLGRxl/cgzlUmAaY/wAepMJq6EHrvI2zYY3SBnMhsX y7RWUVRNbxiaiAE1t5uCyYG9PImJ+/9dR0KGIg93jTvqY0XFQDy403oNMPINt668O0wX lN1x9DKfkMN5phGLIKP2guH1oM8X+oF56FSnIjn/s1bc0sFaDAvUiQYbfWLnDYkAyeoI kOK471SB7zkEDfAJTXNkMVhdXnFBGV5vBVLgIoK40R3LZcAduulDDg2qjj+85oLE8w4w JUmk2whW1jd4H91VlFkZkcbARwnTRCiIuS1DGvK4lzbHyv57ehOmB1Yejn5DI2lnROWR Cvjg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=Bg8ARgAvpxqrVZujUdMBpF3O5GREYgeeic1zpLXlEAs=; b=1G3V8GcSB7fbm1HyYfpMaPWtoSOT7PlcZ11tElWZr6Yj/76HSDTG3z3vDx56RzIvtX YBBLhJmbOBnwcCEoNWHyaVsrddY+Jm3XaFIP2tZMnaaJ7SL4SoyhE0wG8ViQI1SbSDMu E5YZa4EIjNXPunSVhUY4t2ty3xq0MrzmBJWHUKNt+4rGlU7imGJN5Msv6cqXdD7yMW8E HhdB1+Tt0/9A7GC2Ar4GnrxJi9knbihcA+404RgxNJw7aFL730xjgAtSxghQNjZIo09n mZLKFOa3RYWpqzjpVuYCtOeiu8rv/Ji40fvkoaEfMp0E2eJ790Wr5ycpaOWc3y6HCv7D gSYA== X-Gm-Message-State: AOAM532dTxc4xXBh4VhviMKltgYT2/YW+H9qOpa9qSUY6ZbwCgsSj6Sj cIqeLEgLiCJrMGP4j3/0Fdy2JJ1i+Yqu6vU05qFjzpRmjTe1uyuBCal+LAgGb2kwcfyP2ogqR2I 5t5pTkcGX4Fl4+H0vb5NGTQyVV0tD3Is6vrSBCwnHrGcu1mDiKgI42x1ZK2Xy6B4j6T3xJhIS X-Google-Smtp-Source: ABdhPJxyXiC19n49tKMSkgiaHfTQpZqUfpkjxmpDblETVmyKjxfScd3DY0k63b7nH5huwuEYpTcB2HPTnvtH X-Received: from js-desktop.svl.corp.google.com ([2620:15c:2cd:202:ccbe:5d15:e2e6:322]) (user=junaids job=sendgmr) by 2002:a25:6993:0:b0:624:55af:336c with SMTP id e141-20020a256993000000b0062455af336cmr19351739ybc.412.1645593905875; Tue, 22 Feb 2022 21:25:05 -0800 (PST) Date: Tue, 22 Feb 2022 21:22:12 -0800 In-Reply-To: <20220223052223.1202152-1-junaids@google.com> Message-Id: <20220223052223.1202152-37-junaids@google.com> Mime-Version: 1.0 References: <20220223052223.1202152-1-junaids@google.com> X-Mailer: git-send-email 2.35.1.473.g83b2b277ed-goog Subject: [RFC PATCH 36/47] mm: asi: Adding support for dynamic percpu ASI allocations From: Junaid Shahid To: linux-kernel@vger.kernel.org Cc: Ofir Weisse , kvm@vger.kernel.org, pbonzini@redhat.com, jmattson@google.com, pjt@google.com, alexandre.chartre@oracle.com, rppt@linux.ibm.com, dave.hansen@linux.intel.com, peterz@infradead.org, tglx@linutronix.de, luto@kernel.org, linux-mm@kvack.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Ofir Weisse Adding infrastructure to support pcpu_alloc with gfp flag of __GFP_GLOBAL_NONSENSITIVE. We use a similar mechanism as the earlier infrastructure for memcg percpu allocations and add pcpu type PCPU_CHUNK_ASI_NONSENSITIVE. pcpu_chunk_list(PCPU_CHUNK_ASI_NONSENSITIVE) will return a list of ASI nonsensitive percpu chunks, allowing most of the code to be unchanged. Signed-off-by: Ofir Weisse --- mm/percpu-internal.h | 23 ++++++- mm/percpu-km.c | 5 +- mm/percpu-vm.c | 6 +- mm/percpu.c | 139 ++++++++++++++++++++++++++++++++++--------- 4 files changed, 141 insertions(+), 32 deletions(-) diff --git a/mm/percpu-internal.h b/mm/percpu-internal.h index 639662c20c82..2fac01114edc 100644 --- a/mm/percpu-internal.h +++ b/mm/percpu-internal.h @@ -5,6 +5,15 @@ #include #include =20 +enum pcpu_chunk_type { + PCPU_CHUNK_ROOT, +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + PCPU_CHUNK_ASI_NONSENSITIVE, +#endif + PCPU_NR_CHUNK_TYPES, + PCPU_FAIL_ALLOC =3D PCPU_NR_CHUNK_TYPES +}; + /* * pcpu_block_md is the metadata block struct. * Each chunk's bitmap is split into a number of full blocks. @@ -59,6 +68,9 @@ struct pcpu_chunk { #ifdef CONFIG_MEMCG_KMEM struct obj_cgroup **obj_cgroups; /* vector of object cgroups */ #endif +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + bool is_asi_nonsensitive; /* ASI nonsensitive c= hunk */ +#endif =20 int nr_pages; /* # of pages served by this chunk */ int nr_populated; /* # of populated pages */ @@ -68,7 +80,7 @@ struct pcpu_chunk { =20 extern spinlock_t pcpu_lock; =20 -extern struct list_head *pcpu_chunk_lists; +extern struct list_head *pcpu_chunk_lists[PCPU_NR_CHUNK_TYPES]; extern int pcpu_nr_slots; extern int pcpu_sidelined_slot; extern int pcpu_to_depopulate_slot; @@ -113,6 +125,15 @@ static inline int pcpu_chunk_map_bits(struct pcpu_chun= k *chunk) return pcpu_nr_pages_to_map_bits(chunk->nr_pages); } =20 +static inline enum pcpu_chunk_type pcpu_chunk_type(struct pcpu_chunk *chun= k) +{ +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + if (chunk->is_asi_nonsensitive) + return PCPU_CHUNK_ASI_NONSENSITIVE; +#endif + return PCPU_CHUNK_ROOT; +} + #ifdef CONFIG_PERCPU_STATS =20 #include diff --git a/mm/percpu-km.c b/mm/percpu-km.c index fe31aa19db81..01e31bd55860 100644 --- a/mm/percpu-km.c +++ b/mm/percpu-km.c @@ -50,7 +50,8 @@ static void pcpu_depopulate_chunk(struct pcpu_chunk *chun= k, /* nada */ } =20 -static struct pcpu_chunk *pcpu_create_chunk(gfp_t gfp) +static struct pcpu_chunk *pcpu_create_chunk(enum pcpu_chunk_type type, + gfp_t gfp) { const int nr_pages =3D pcpu_group_sizes[0] >> PAGE_SHIFT; struct pcpu_chunk *chunk; @@ -58,7 +59,7 @@ static struct pcpu_chunk *pcpu_create_chunk(gfp_t gfp) unsigned long flags; int i; =20 - chunk =3D pcpu_alloc_chunk(gfp); + chunk =3D pcpu_alloc_chunk(type, gfp); if (!chunk) return NULL; =20 diff --git a/mm/percpu-vm.c b/mm/percpu-vm.c index 5579a96ad782..59f3b55abdd1 100644 --- a/mm/percpu-vm.c +++ b/mm/percpu-vm.c @@ -357,7 +357,8 @@ static void pcpu_depopulate_chunk(struct pcpu_chunk *ch= unk, pcpu_free_pages(chunk, pages, page_start, page_end); } =20 -static struct pcpu_chunk *pcpu_create_chunk(gfp_t gfp) +static struct pcpu_chunk *pcpu_create_chunk(enum pcpu_chunk_type type, + gfp_t gfp) { struct pcpu_chunk *chunk; struct vm_struct **vms; @@ -368,7 +369,8 @@ static struct pcpu_chunk *pcpu_create_chunk(gfp_t gfp) =20 gfp &=3D ~__GFP_GLOBAL_NONSENSITIVE; =20 - chunk =3D pcpu_alloc_chunk(gfp); + chunk =3D pcpu_alloc_chunk(type, gfp); + if (!chunk) return NULL; =20 diff --git a/mm/percpu.c b/mm/percpu.c index f5b2c2ea5a54..beaca5adf9d4 100644 --- a/mm/percpu.c +++ b/mm/percpu.c @@ -172,7 +172,7 @@ struct pcpu_chunk *pcpu_reserved_chunk __ro_after_init; DEFINE_SPINLOCK(pcpu_lock); /* all internal data structures */ static DEFINE_MUTEX(pcpu_alloc_mutex); /* chunk create/destroy, [de]pop, m= ap ext */ =20 -struct list_head *pcpu_chunk_lists __ro_after_init; /* chunk list slots */ +struct list_head *pcpu_chunk_lists[PCPU_NR_CHUNK_TYPES] __ro_after_init; /= * chunk list slots */ =20 /* chunks which need their map areas extended, protected by pcpu_lock */ static LIST_HEAD(pcpu_map_extend_chunks); @@ -531,10 +531,12 @@ static void __pcpu_chunk_move(struct pcpu_chunk *chun= k, int slot, bool move_front) { if (chunk !=3D pcpu_reserved_chunk) { + struct list_head *pcpu_type_lists =3D + pcpu_chunk_lists[pcpu_chunk_type(chunk)]; if (move_front) - list_move(&chunk->list, &pcpu_chunk_lists[slot]); + list_move(&chunk->list, &pcpu_type_lists[slot]); else - list_move_tail(&chunk->list, &pcpu_chunk_lists[slot]); + list_move_tail(&chunk->list, &pcpu_type_lists[slot]); } } =20 @@ -570,13 +572,16 @@ static void pcpu_chunk_relocate(struct pcpu_chunk *ch= unk, int oslot) =20 static void pcpu_isolate_chunk(struct pcpu_chunk *chunk) { + struct list_head *pcpu_type_lists =3D + pcpu_chunk_lists[pcpu_chunk_type(chunk)]; + lockdep_assert_held(&pcpu_lock); =20 if (!chunk->isolated) { chunk->isolated =3D true; pcpu_nr_empty_pop_pages -=3D chunk->nr_empty_pop_pages; } - list_move(&chunk->list, &pcpu_chunk_lists[pcpu_to_depopulate_slot]); + list_move(&chunk->list, &pcpu_type_lists[pcpu_to_depopulate_slot]); } =20 static void pcpu_reintegrate_chunk(struct pcpu_chunk *chunk) @@ -1438,7 +1443,8 @@ static struct pcpu_chunk * __init pcpu_alloc_first_ch= unk(unsigned long tmp_addr, return chunk; } =20 -static struct pcpu_chunk *pcpu_alloc_chunk(gfp_t gfp) +static struct pcpu_chunk *pcpu_alloc_chunk(enum pcpu_chunk_type type, + gfp_t gfp) { struct pcpu_chunk *chunk; int region_bits; @@ -1475,6 +1481,13 @@ static struct pcpu_chunk *pcpu_alloc_chunk(gfp_t gfp) goto objcg_fail; } #endif +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + /* TODO: (oweisse) do asi_map for nonsensitive chunks */ + if (type =3D=3D PCPU_CHUNK_ASI_NONSENSITIVE) + chunk->is_asi_nonsensitive =3D true; + else + chunk->is_asi_nonsensitive =3D false; +#endif =20 pcpu_init_md_blocks(chunk); =20 @@ -1580,7 +1593,8 @@ static void pcpu_depopulate_chunk(struct pcpu_chunk *= chunk, int page_start, int page_end); static void pcpu_post_unmap_tlb_flush(struct pcpu_chunk *chunk, int page_start, int page_end); -static struct pcpu_chunk *pcpu_create_chunk(gfp_t gfp); +static struct pcpu_chunk *pcpu_create_chunk(enum pcpu_chunk_type type, + gfp_t gfp); static void pcpu_destroy_chunk(struct pcpu_chunk *chunk); static struct page *pcpu_addr_to_page(void *addr); static int __init pcpu_verify_alloc_info(const struct pcpu_alloc_info *ai); @@ -1733,6 +1747,8 @@ static void __percpu *pcpu_alloc(size_t size, size_t = align, bool reserved, unsigned long flags; void __percpu *ptr; size_t bits, bit_align; + enum pcpu_chunk_type type; + struct list_head *pcpu_type_lists; =20 gfp =3D current_gfp_context(gfp); /* whitelisted flags that can be passed to the backing allocators */ @@ -1763,6 +1779,16 @@ static void __percpu *pcpu_alloc(size_t size, size_t= align, bool reserved, if (unlikely(!pcpu_memcg_pre_alloc_hook(size, gfp, &objcg))) return NULL; =20 + type =3D PCPU_CHUNK_ROOT; +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + if (static_asi_enabled() && (gfp & __GFP_GLOBAL_NONSENSITIVE)) { + type =3D PCPU_CHUNK_ASI_NONSENSITIVE; + pcpu_gfp |=3D __GFP_GLOBAL_NONSENSITIVE; + } +#endif + pcpu_type_lists =3D pcpu_chunk_lists[type]; + BUG_ON(!pcpu_type_lists); + if (!is_atomic) { /* * pcpu_balance_workfn() allocates memory under this mutex, @@ -1800,7 +1826,7 @@ static void __percpu *pcpu_alloc(size_t size, size_t = align, bool reserved, restart: /* search through normal chunks */ for (slot =3D pcpu_size_to_slot(size); slot <=3D pcpu_free_slot; slot++) { - list_for_each_entry_safe(chunk, next, &pcpu_chunk_lists[slot], + list_for_each_entry_safe(chunk, next, &pcpu_type_lists[slot], list) { off =3D pcpu_find_block_fit(chunk, bits, bit_align, is_atomic); @@ -1830,8 +1856,8 @@ static void __percpu *pcpu_alloc(size_t size, size_t = align, bool reserved, goto fail; } =20 - if (list_empty(&pcpu_chunk_lists[pcpu_free_slot])) { - chunk =3D pcpu_create_chunk(pcpu_gfp); + if (list_empty(&pcpu_type_lists[pcpu_free_slot])) { + chunk =3D pcpu_create_chunk(type, pcpu_gfp); if (!chunk) { err =3D "failed to allocate new chunk"; goto fail; @@ -1983,12 +2009,19 @@ void __percpu *__alloc_reserved_percpu(size_t size,= size_t align) * CONTEXT: * pcpu_lock (can be dropped temporarily) */ -static void pcpu_balance_free(bool empty_only) + +static void __pcpu_balance_free(bool empty_only, + enum pcpu_chunk_type type) { LIST_HEAD(to_free); - struct list_head *free_head =3D &pcpu_chunk_lists[pcpu_free_slot]; + struct list_head *pcpu_type_lists =3D pcpu_chunk_lists[type]; + struct list_head *free_head; struct pcpu_chunk *chunk, *next; =20 + if (!pcpu_type_lists) + return; + free_head =3D &pcpu_type_lists[pcpu_free_slot]; + lockdep_assert_held(&pcpu_lock); =20 /* @@ -2026,6 +2059,14 @@ static void pcpu_balance_free(bool empty_only) spin_lock_irq(&pcpu_lock); } =20 +static void pcpu_balance_free(bool empty_only) +{ + enum pcpu_chunk_type type; + for (type =3D 0; type < PCPU_NR_CHUNK_TYPES; type++) { + __pcpu_balance_free(empty_only, type); + } +} + /** * pcpu_balance_populated - manage the amount of populated pages * @@ -2038,12 +2079,21 @@ static void pcpu_balance_free(bool empty_only) * CONTEXT: * pcpu_lock (can be dropped temporarily) */ -static void pcpu_balance_populated(void) +static void __pcpu_balance_populated(enum pcpu_chunk_type type) { /* gfp flags passed to underlying allocators */ - const gfp_t gfp =3D GFP_KERNEL | __GFP_NORETRY | __GFP_NOWARN; + const gfp_t gfp =3D GFP_KERNEL | __GFP_NORETRY | __GFP_NOWARN +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + | (type =3D=3D PCPU_CHUNK_ASI_NONSENSITIVE ? + __GFP_GLOBAL_NONSENSITIVE : 0) +#endif + ; struct pcpu_chunk *chunk; int slot, nr_to_pop, ret; + struct list_head *pcpu_type_lists =3D pcpu_chunk_lists[type]; + + if (!pcpu_type_lists) + return; =20 lockdep_assert_held(&pcpu_lock); =20 @@ -2074,7 +2124,7 @@ static void pcpu_balance_populated(void) if (!nr_to_pop) break; =20 - list_for_each_entry(chunk, &pcpu_chunk_lists[slot], list) { + list_for_each_entry(chunk, &pcpu_type_lists[slot], list) { nr_unpop =3D chunk->nr_pages - chunk->nr_populated; if (nr_unpop) break; @@ -2107,7 +2157,7 @@ static void pcpu_balance_populated(void) if (nr_to_pop) { /* ran out of chunks to populate, create a new one and retry */ spin_unlock_irq(&pcpu_lock); - chunk =3D pcpu_create_chunk(gfp); + chunk =3D pcpu_create_chunk(type, gfp); cond_resched(); spin_lock_irq(&pcpu_lock); if (chunk) { @@ -2117,6 +2167,14 @@ static void pcpu_balance_populated(void) } } =20 +static void pcpu_balance_populated() +{ + enum pcpu_chunk_type type; + + for (type =3D 0; type < PCPU_NR_CHUNK_TYPES; type++) + __pcpu_balance_populated(type); +} + /** * pcpu_reclaim_populated - scan over to_depopulate chunks and free empty = pages * @@ -2132,13 +2190,19 @@ static void pcpu_balance_populated(void) * pcpu_lock (can be dropped temporarily) * */ -static void pcpu_reclaim_populated(void) + + +static void __pcpu_reclaim_populated(enum pcpu_chunk_type type) { struct pcpu_chunk *chunk; struct pcpu_block_md *block; int freed_page_start, freed_page_end; int i, end; bool reintegrate; + struct list_head *pcpu_type_lists =3D pcpu_chunk_lists[type]; + + if (!pcpu_type_lists) + return; =20 lockdep_assert_held(&pcpu_lock); =20 @@ -2148,8 +2212,8 @@ static void pcpu_reclaim_populated(void) * other accessor is the free path which only returns area back to the * allocator not touching the populated bitmap. */ - while (!list_empty(&pcpu_chunk_lists[pcpu_to_depopulate_slot])) { - chunk =3D list_first_entry(&pcpu_chunk_lists[pcpu_to_depopulate_slot], + while (!list_empty(&pcpu_type_lists[pcpu_to_depopulate_slot])) { + chunk =3D list_first_entry(&pcpu_type_lists[pcpu_to_depopulate_slot], struct pcpu_chunk, list); WARN_ON(chunk->immutable); =20 @@ -2219,10 +2283,18 @@ static void pcpu_reclaim_populated(void) pcpu_reintegrate_chunk(chunk); else list_move_tail(&chunk->list, - &pcpu_chunk_lists[pcpu_sidelined_slot]); + &pcpu_type_lists[pcpu_sidelined_slot]); } } =20 +static void pcpu_reclaim_populated(void) +{ + enum pcpu_chunk_type type; + for (type =3D 0; type < PCPU_NR_CHUNK_TYPES; type++) { + __pcpu_reclaim_populated(type); + } +} + /** * pcpu_balance_workfn - manage the amount of free chunks and populated pa= ges * @work: unused @@ -2268,6 +2340,7 @@ void free_percpu(void __percpu *ptr) unsigned long flags; int size, off; bool need_balance =3D false; + struct list_head *pcpu_type_lists =3D NULL; =20 if (!ptr) return; @@ -2280,6 +2353,8 @@ void free_percpu(void __percpu *ptr) =20 chunk =3D pcpu_chunk_addr_search(addr); off =3D addr - chunk->base_addr; + pcpu_type_lists =3D pcpu_chunk_lists[pcpu_chunk_type(chunk)]; + BUG_ON(!pcpu_type_lists); =20 size =3D pcpu_free_area(chunk, off); =20 @@ -2293,7 +2368,7 @@ void free_percpu(void __percpu *ptr) if (!chunk->isolated && chunk->free_bytes =3D=3D pcpu_unit_size) { struct pcpu_chunk *pos; =20 - list_for_each_entry(pos, &pcpu_chunk_lists[pcpu_free_slot], list) + list_for_each_entry(pos, &pcpu_type_lists[pcpu_free_slot], list) if (pos !=3D chunk) { need_balance =3D true; break; @@ -2601,6 +2676,7 @@ void __init pcpu_setup_first_chunk(const struct pcpu_= alloc_info *ai, int map_size; unsigned long tmp_addr; size_t alloc_size; + enum pcpu_chunk_type type; =20 #define PCPU_SETUP_BUG_ON(cond) do { \ if (unlikely(cond)) { \ @@ -2723,15 +2799,24 @@ void __init pcpu_setup_first_chunk(const struct pcp= u_alloc_info *ai, pcpu_free_slot =3D pcpu_sidelined_slot + 1; pcpu_to_depopulate_slot =3D pcpu_free_slot + 1; pcpu_nr_slots =3D pcpu_to_depopulate_slot + 1; - pcpu_chunk_lists =3D memblock_alloc(pcpu_nr_slots * - sizeof(pcpu_chunk_lists[0]), + for (type =3D 0; type < PCPU_NR_CHUNK_TYPES; type++) { +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + if (type =3D=3D PCPU_CHUNK_ASI_NONSENSITIVE && + !static_asi_enabled()) { + pcpu_chunk_lists[type] =3D NULL; + continue; + } +#endif + pcpu_chunk_lists[type] =3D memblock_alloc(pcpu_nr_slots * + sizeof(pcpu_chunk_lists[0][0]), SMP_CACHE_BYTES); - if (!pcpu_chunk_lists) - panic("%s: Failed to allocate %zu bytes\n", __func__, - pcpu_nr_slots * sizeof(pcpu_chunk_lists[0])); + if (!pcpu_chunk_lists[type]) + panic("%s: Failed to allocate %zu bytes\n", __func= __, + pcpu_nr_slots * sizeof(pcpu_chunk_lists[0][0= ])); =20 - for (i =3D 0; i < pcpu_nr_slots; i++) - INIT_LIST_HEAD(&pcpu_chunk_lists[i]); + for (i =3D 0; i < pcpu_nr_slots; i++) + INIT_LIST_HEAD(&pcpu_chunk_lists[type][i]); + } =20 /* * The end of the static region needs to be aligned with the --=20 2.35.1.473.g83b2b277ed-goog From nobody Fri Jun 26 13:44:01 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8E3F6C4167B for ; Wed, 23 Feb 2022 05:28:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238616AbiBWF2E (ORCPT ); Wed, 23 Feb 2022 00:28:04 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57702 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238402AbiBWF0w (ORCPT ); Wed, 23 Feb 2022 00:26:52 -0500 Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com [IPv6:2607:f8b0:4864:20::114a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E16006A06E for ; Tue, 22 Feb 2022 21:25:17 -0800 (PST) Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-2d306e372e5so163264787b3.5 for ; Tue, 22 Feb 2022 21:25:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=MBcUE/GjgKAOqkE7k3I16nNGFc2xdrG3rcZqdKaqU1M=; b=huyqkiXgbsdUUaeO2YcSpDguTaIsD2ZiHBttX5Znp2Wu7SotB1quckmXNNZhq3hhVZ CvfghBA5TRiyV7nNwREBXze4Ut4RZl9/dXE69gre6OrHccOhGsM1R39ve/3Fg95dFOoY BvDeP6788/BG9zG4uemrkiHvTMwkxVsftUJqZuGB4ivr+/aWtu4qk7dIqmSXEwo1GhxB UQrfqBGsjla2ZXhbWoUWiWOKjoE4zvV4OfIuCxJHeuNkl1zKwODoNYWYiaOMSwj9wOKz BS0K9yywZ4zV80Is1LL2lLof7xLiZKUcwJTWCzQDyX5SOghoozr2KNat8Mt1uT82lph4 NHIQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=MBcUE/GjgKAOqkE7k3I16nNGFc2xdrG3rcZqdKaqU1M=; b=fadY7nuPGCjs2m8c4anZz0ZNASynst2NWM9WoKFAF/r7bGQFanBdCtVezGRTI8CEPY 34kHJBWq5HrUHPp7jsBSxkBsoEv/PKqZS2gLO7k0h43gHRZqeC/5te7rMJgIVRvrY9g3 gBQtPURAyXr6dyYpU67isYyWlObJUvQKCj2CgIEsGhjk67xn3vgUae5ka2hWbdimimVS 2da3NWWv9m759gbG2kFo9UN0qbH2+8cocjE5V7bWuFVbIII9tZGsOWff0LoxvPqIJGr2 AJqT5IHXMw8aoQWEjXzbwrL/zYhuPnYcp8qLL7IgnsYxwDrGvxyYGTenJPiU4iPKIY9x j+vw== X-Gm-Message-State: AOAM531v3k0x79rP3cGruSJV4dXneT/qNfNlbQpUMEv4J1aIe6eobO87 r9wbpU4PFKviCkvtcJ3TMGa3GrvdKF0G6jFhw6vI3lsKilnwpzlfBvYcwFjHgwCZeYJ2iNlR+q7 uUcYFbWDmcsFi/IwkqDLOlbRYnWgjUq//DnYceEofJG8HxVQJlVIljlIkGv9hX+x2fyyhNTwV X-Google-Smtp-Source: ABdhPJzeh7uokPy0ce1B9Y+L+/uH+xpWq2sOY5OlnwbNrO2jfXStVDw+VQrszqUGfmnR9pf7y3TYhwx7yyIZ X-Received: from js-desktop.svl.corp.google.com ([2620:15c:2cd:202:ccbe:5d15:e2e6:322]) (user=junaids job=sendgmr) by 2002:a81:7d56:0:b0:2d6:90d9:770c with SMTP id y83-20020a817d56000000b002d690d9770cmr26589608ywc.277.1645593908116; Tue, 22 Feb 2022 21:25:08 -0800 (PST) Date: Tue, 22 Feb 2022 21:22:13 -0800 In-Reply-To: <20220223052223.1202152-1-junaids@google.com> Message-Id: <20220223052223.1202152-38-junaids@google.com> Mime-Version: 1.0 References: <20220223052223.1202152-1-junaids@google.com> X-Mailer: git-send-email 2.35.1.473.g83b2b277ed-goog Subject: [RFC PATCH 37/47] mm: asi: ASI annotation support for static variables. From: Junaid Shahid To: linux-kernel@vger.kernel.org Cc: Ofir Weisse , kvm@vger.kernel.org, pbonzini@redhat.com, jmattson@google.com, pjt@google.com, alexandre.chartre@oracle.com, rppt@linux.ibm.com, dave.hansen@linux.intel.com, peterz@infradead.org, tglx@linutronix.de, luto@kernel.org, linux-mm@kvack.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Ofir Weisse Added the following annotations: __asi_not_sensitive: for static variables which are considered not sensitive. __asi_not_sensitive_readmostly: similar to __read_mostly, for non-sensitive static variables. Signed-off-by: Ofir Weisse --- arch/x86/include/asm/asi.h | 12 ++++++++++++ include/asm-generic/asi.h | 6 ++++++ include/asm-generic/vmlinux.lds.h | 18 +++++++++++++++++- 3 files changed, 35 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/asi.h b/arch/x86/include/asm/asi.h index bdb2f70d4f85..6dd9c7c8a2b8 100644 --- a/arch/x86/include/asm/asi.h +++ b/arch/x86/include/asm/asi.h @@ -177,6 +177,18 @@ static inline pgd_t *asi_pgd(struct asi *asi) return asi->pgd; } =20 +/* IMPORTANT: Any modification to the name here should also be applied to + * include/asm-generic/vmlinux.lds.h */ +#define ASI_NON_SENSITIVE_SECTION_NAME ".data..asi_non_sensitive" +#define ASI_NON_SENSITIVE_READ_MOSTLY_SECTION_NAME \ + ".data..asi_non_sensitive_readmostly" + +#define __asi_not_sensitive \ + __section(ASI_NON_SENSITIVE_SECTION_NAME) + +#define __asi_not_sensitive_readmostly \ + __section(ASI_NON_SENSITIVE_READ_MOSTLY_SECTION_NAME) + #else /* CONFIG_ADDRESS_SPACE_ISOLATION */ =20 static inline void asi_intr_enter(void) { } diff --git a/include/asm-generic/asi.h b/include/asm-generic/asi.h index fffb323d2a00..d9082267a5dd 100644 --- a/include/asm-generic/asi.h +++ b/include/asm-generic/asi.h @@ -121,6 +121,12 @@ void asi_flush_tlb_range(struct asi *asi, void *addr, = size_t len) { } #define static_asi_enabled() false =20 =20 +/* IMPORTANT: Any modification to the name here should also be applied to + * include/asm-generic/vmlinux.lds.h */ + +#define __asi_not_sensitive +#define __asi_not_sensitive_readmostly + #endif /* !_ASSEMBLY_ */ =20 #endif /* !CONFIG_ADDRESS_SPACE_ISOLATION */ diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinu= x.lds.h index 42f3866bca69..c769d939c15f 100644 --- a/include/asm-generic/vmlinux.lds.h +++ b/include/asm-generic/vmlinux.lds.h @@ -374,10 +374,26 @@ . =3D ALIGN(PAGE_SIZE); \ __nosave_end =3D .; =20 +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION +#define ASI_NOT_SENSITIVE_DATA(page_align) \ + . =3D ALIGN(page_align); \ + __start_asi_nonsensitive =3D .; \ + *(.data..asi_non_sensitive) \ + . =3D ALIGN(page_align); \ + __end_asi_nonsensitive =3D .; \ + __start_asi_nonsensitive_readmostly =3D .; \ + *(.data..asi_non_sensitive_readmostly) \ + . =3D ALIGN(page_align); \ + __end_asi_nonsensitive_readmostly =3D .; +#else +#define ASI_NOT_SENSITIVE_DATA +#endif + #define PAGE_ALIGNED_DATA(page_align) \ . =3D ALIGN(page_align); \ *(.data..page_aligned) \ - . =3D ALIGN(page_align); + . =3D ALIGN(page_align); \ + ASI_NOT_SENSITIVE_DATA(page_align) =20 #define READ_MOSTLY_DATA(align) \ . =3D ALIGN(align); \ --=20 2.35.1.473.g83b2b277ed-goog From nobody Fri Jun 26 13:44:01 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7DF98C43219 for ; Wed, 23 Feb 2022 05:28:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238557AbiBWF14 (ORCPT ); Wed, 23 Feb 2022 00:27:56 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57802 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238416AbiBWF0x (ORCPT ); Wed, 23 Feb 2022 00:26:53 -0500 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E28136BDC2 for ; Tue, 22 Feb 2022 21:25:21 -0800 (PST) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-2d306e372e5so163265267b3.5 for ; Tue, 22 Feb 2022 21:25:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=KXrCYGgvo83JEgYfSU2xPmsuGvCpoOVuzDhVbsTzD0I=; b=peKSlY/Xr61DvdFpVmXc37FMKYnl+vkSCRtwe1UF1UGbHZ0v3cMEdZbiKKvXTZ5GbO laOcMzhj8sBkgNnt1mc4a0NZR0hCQ3gPAdN/2qIaNw43JOyvktEB0Y0rn3QcamgO9gZ8 7DbNXkvTjq6TGSgTmAj628OIdjZHlZYdFsABsewAF21y3xPSWYsHQAWzziaNW39gFRL5 yWxZv0skPs5ax5wbHVY8sW6jeIPlZLUx/HQpjPftWbdPu83nt4wFjKis+ce33NEw7HrY 03O9DgYBb/75g6K+v8UjyCDA3xzr7E/F2srmJjBiOtEMdOtKjPJnG/mlJ872cZ/zgYG/ 2wwQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=KXrCYGgvo83JEgYfSU2xPmsuGvCpoOVuzDhVbsTzD0I=; b=HE8Z0sIbmS6vjfT8GYq7DLBAWgCZgIN96fMbU5W9L1iYUerybzh/vOhsxCBYoymJbR kckoquawc01xVoXdDVpUu+vJcY6kV9J03+ub5DUCU5vJVNmPkWJRsBW+hTBKUese/3l3 I9e4CH9RLw10HlFHq2S4FTeywMb/Q1vwi/fi7C7zOeWTYo+5lDNf9c7upgXXSJ7fwgU2 qr+QOaogHbmOazAOPv2zONE73Uus74glJxk7ANOzxcqR7H4z5r/rUJpSd+7zpNUD5+GG TUMuIcaLQY90Yjw2gWF2G1yadWqce06oimKkdF6uwvg4Bb4T/kub7k1VOZ6ncWHGFD/z 6mnQ== X-Gm-Message-State: AOAM5307rbX4rfwQaD3XazStNoBIVSzgR+nNBffeVmArwjX0UW/j+z0o aTswlYDLHKmsXbRj2g+pwSXnXEtWLUfQe2QaDP/1EHJSE74e22i8f4v52UyDPQE03IPkhgbELvT LK2eLzxwKruBwmmq2GClkc4R1MIwFgDTAmQmdsHATyYm9Ix+4a9yrtRXx/WyU9sh8zL/jzPqF X-Google-Smtp-Source: ABdhPJyZfLfl11T/5CrHmveifdlj9QKZBHoh8PI3AMF6sP4DdsodfKmQ8jHyUSfS7xsGTotzIsjTTydYzCuX X-Received: from js-desktop.svl.corp.google.com ([2620:15c:2cd:202:ccbe:5d15:e2e6:322]) (user=junaids job=sendgmr) by 2002:a81:c607:0:b0:2cb:a34a:355c with SMTP id l7-20020a81c607000000b002cba34a355cmr27125747ywi.487.1645593910227; Tue, 22 Feb 2022 21:25:10 -0800 (PST) Date: Tue, 22 Feb 2022 21:22:14 -0800 In-Reply-To: <20220223052223.1202152-1-junaids@google.com> Message-Id: <20220223052223.1202152-39-junaids@google.com> Mime-Version: 1.0 References: <20220223052223.1202152-1-junaids@google.com> X-Mailer: git-send-email 2.35.1.473.g83b2b277ed-goog Subject: [RFC PATCH 38/47] mm: asi: ASI annotation support for dynamic modules. From: Junaid Shahid To: linux-kernel@vger.kernel.org Cc: Ofir Weisse , kvm@vger.kernel.org, pbonzini@redhat.com, jmattson@google.com, pjt@google.com, alexandre.chartre@oracle.com, rppt@linux.ibm.com, dave.hansen@linux.intel.com, peterz@infradead.org, tglx@linutronix.de, luto@kernel.org, linux-mm@kvack.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Ofir Weisse Adding support for use of ASI static variable annotations in dynamic modules: - __asi_not_sensitive and - __asi_not_sensitive_readmostly Per module, we now have the following offsets: 1. asi_section_offset/size - which should be mapped into asi global pool 2. asi_readmostly_section/size - same as above, for read mostly data; 3. once_section_offset/size - is considered asi non-sensitive Signed-off-by: Ofir Weisse --- arch/x86/include/asm/asi.h | 3 ++ arch/x86/mm/asi.c | 66 ++++++++++++++++++++++++++++++++++++++ include/asm-generic/asi.h | 3 ++ include/linux/module.h | 9 ++++++ kernel/module.c | 58 +++++++++++++++++++++++++++++++++ 5 files changed, 139 insertions(+) diff --git a/arch/x86/include/asm/asi.h b/arch/x86/include/asm/asi.h index 6dd9c7c8a2b8..d43f6aadffee 100644 --- a/arch/x86/include/asm/asi.h +++ b/arch/x86/include/asm/asi.h @@ -98,6 +98,9 @@ static inline void asi_init_thread_state(struct thread_st= ruct *thread) thread->intr_nest_depth =3D 0; } =20 +int asi_load_module(struct module* module); +void asi_unload_module(struct module* module); + static inline void asi_set_target_unrestricted(void) { if (static_cpu_has(X86_FEATURE_ASI)) { diff --git a/arch/x86/mm/asi.c b/arch/x86/mm/asi.c index 9b1bd005f343..6c14aa1fc4aa 100644 --- a/arch/x86/mm/asi.c +++ b/arch/x86/mm/asi.c @@ -5,6 +5,7 @@ #include #include #include +#include =20 #include #include @@ -308,6 +309,71 @@ static int __init set_asi_param(char *str) } early_param("asi", set_asi_param); =20 +/* asi_load_module() is called from layout_and_allocate() in kernel/module= .c + * We map the module and its data in init_mm.asi_pgd[0]. +*/ +int asi_load_module(struct module* module) +{ + int err =3D 0; + + /* Map the cod/text */ + err =3D asi_map(ASI_GLOBAL_NONSENSITIVE, + module->core_layout.base, + module->core_layout.ro_after_init_size ); + if (err) + return err; + + /* Map global variables annotated as non-sensitive for ASI */ + err =3D asi_map(ASI_GLOBAL_NONSENSITIVE, + module->core_layout.base + + module->core_layout.asi_section_offset, + module->core_layout.asi_section_size ); + if (err) + return err; + + /* Map global variables annotated as non-sensitive for ASI */ + err =3D asi_map(ASI_GLOBAL_NONSENSITIVE, + module->core_layout.base + + module->core_layout.asi_readmostly_section_offset, + module->core_layout.asi_readmostly_section_size); + if (err) + return err; + + /* Map .data.once section as well */ + err =3D asi_map(ASI_GLOBAL_NONSENSITIVE, + module->core_layout.base + + module->core_layout.once_section_offset, + module->core_layout.once_section_size ); + if (err) + return err; + + return 0; +} +EXPORT_SYMBOL_GPL(asi_load_module); + +void asi_unload_module(struct module* module) +{ + asi_unmap(ASI_GLOBAL_NONSENSITIVE, + module->core_layout.base, + module->core_layout.ro_after_init_size, true); + + asi_unmap(ASI_GLOBAL_NONSENSITIVE, + module->core_layout.base + + module->core_layout.asi_section_offset, + module->core_layout.asi_section_size, true); + + asi_unmap(ASI_GLOBAL_NONSENSITIVE, + module->core_layout.base + + module->core_layout.asi_readmostly_section_offset, + module->core_layout.asi_readmostly_section_size, tru= e); + + asi_unmap(ASI_GLOBAL_NONSENSITIVE, + module->core_layout.base + + module->core_layout.once_section_offset, + module->core_layout.once_section_size, true); + +} + static int __init asi_global_init(void) { uint i, n; diff --git a/include/asm-generic/asi.h b/include/asm-generic/asi.h index d9082267a5dd..2763cb1a974c 100644 --- a/include/asm-generic/asi.h +++ b/include/asm-generic/asi.h @@ -120,6 +120,7 @@ void asi_flush_tlb_range(struct asi *asi, void *addr, s= ize_t len) { } =20 #define static_asi_enabled() false =20 +static inline int asi_load_module(struct module* module) {return 0;} =20 /* IMPORTANT: Any modification to the name here should also be applied to * include/asm-generic/vmlinux.lds.h */ @@ -127,6 +128,8 @@ void asi_flush_tlb_range(struct asi *asi, void *addr, s= ize_t len) { } #define __asi_not_sensitive #define __asi_not_sensitive_readmostly =20 +static inline void asi_unload_module(struct module* module) { } + #endif /* !_ASSEMBLY_ */ =20 #endif /* !CONFIG_ADDRESS_SPACE_ISOLATION */ diff --git a/include/linux/module.h b/include/linux/module.h index c9f1200b2312..82267a95f936 100644 --- a/include/linux/module.h +++ b/include/linux/module.h @@ -336,6 +336,15 @@ struct module_layout { #ifdef CONFIG_MODULES_TREE_LOOKUP struct mod_tree_node mtn; #endif + +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + unsigned int asi_section_offset; + unsigned int asi_section_size; + unsigned int asi_readmostly_section_offset; + unsigned int asi_readmostly_section_size; + unsigned int once_section_offset; + unsigned int once_section_size; +#endif }; =20 #ifdef CONFIG_MODULES_TREE_LOOKUP diff --git a/kernel/module.c b/kernel/module.c index 84a9141a5e15..d363b8a0ee24 100644 --- a/kernel/module.c +++ b/kernel/module.c @@ -2159,6 +2159,8 @@ static void free_module(struct module *mod) { trace_module_free(mod); =20 + asi_unload_module(mod); + mod_sysfs_teardown(mod); =20 /* @@ -2416,6 +2418,31 @@ static bool module_init_layout_section(const char *s= name) return module_init_section(sname); } =20 +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION +static void asi_record_sections_layout(struct module *mod, + const char *sname, + Elf_Shdr *s) +{ + if (strstarts(sname, ASI_NON_SENSITIVE_READ_MOSTLY_SECTION_NAME)) { + mod->core_layout.asi_readmostly_section_offset =3D s->sh_e= ntsize; + mod->core_layout.asi_readmostly_section_size =3D s->sh_s= ize; + } + else if (strstarts(sname, ASI_NON_SENSITIVE_SECTION_NAME)) { + mod->core_layout.asi_section_offset =3D s->sh_entsize; + mod->core_layout.asi_section_size =3D s->sh_size; + } + if (strstarts(sname, ".data.once")) { + mod->core_layout.once_section_offset =3D s->sh_entsize; + mod->core_layout.once_section_size =3D s->sh_size; + } +} +#else +static void asi_record_sections_layout(struct module *mod, + const char *sname, + Elf_Shdr *s) +{} +#endif + /* * Lay out the SHF_ALLOC sections in a way not dissimilar to how ld * might -- code, read-only data, read-write data, small data. Tally @@ -2453,6 +2480,7 @@ static void layout_sections(struct module *mod, struc= t load_info *info) || module_init_layout_section(sname)) continue; s->sh_entsize =3D get_offset(mod, &mod->core_layout.size, s, i); + asi_record_sections_layout(mod, sname, s); pr_debug("\t%s\n", sname); } switch (m) { @@ -3558,6 +3586,25 @@ static bool blacklisted(const char *module_name) } core_param(module_blacklist, module_blacklist, charp, 0400); =20 +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION +static void asi_fix_section_size_and_alignment(struct load_info *info, + char *section_to_fix) +{ + unsigned int ndx =3D find_sec(info, section_to_fix ); + if (!ndx) + return; + + info->sechdrs[ndx].sh_addralign =3D PAGE_SIZE; + info->sechdrs[ndx].sh_size =3D + ALIGN( info->sechdrs[ndx].sh_size, PAGE_SIZE ); +} +#else +static inline void asi_fix_section_size_and_alignment(struct load_info *in= fo, + char *section_to_fix) +{} +#endif + + static struct module *layout_and_allocate(struct load_info *info, int flag= s) { struct module *mod; @@ -3600,6 +3647,15 @@ static struct module *layout_and_allocate(struct loa= d_info *info, int flags) if (ndx) info->sechdrs[ndx].sh_flags |=3D SHF_RO_AFTER_INIT; =20 +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + /* These are sections we will want to map into an ASI page-table. = We + * therefore need these sections to be aligned to a PAGE_SIZE */ + asi_fix_section_size_and_alignment(info, ASI_NON_SENSITIVE_SECTION= _NAME); + asi_fix_section_size_and_alignment(info, + ASI_NON_SENSITIVE_READ_MOSTLY_S= ECTION_NAME); + asi_fix_section_size_and_alignment(info, ".data.once"); +#endif + /* * Determine total sizes, and put offsets in sh_entsize. For now * this is done generically; there doesn't appear to be any @@ -4127,6 +4183,8 @@ static int load_module(struct load_info *info, const = char __user *uargs, /* Get rid of temporary copy. */ free_copy(info); =20 + asi_load_module(mod); + /* Done! */ trace_module_load(mod); =20 --=20 2.35.1.473.g83b2b277ed-goog From nobody Fri Jun 26 13:44:01 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5CDF1C43217 for ; Wed, 23 Feb 2022 05:28:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238517AbiBWF1y (ORCPT ); Wed, 23 Feb 2022 00:27:54 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57812 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238356AbiBWF0y (ORCPT ); Wed, 23 Feb 2022 00:26:54 -0500 Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com [IPv6:2607:f8b0:4864:20::114a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 44B386C1C7 for ; Tue, 22 Feb 2022 21:25:23 -0800 (PST) Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-2d306e372e5so163265737b3.5 for ; Tue, 22 Feb 2022 21:25:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=Kqez4sz2gXA92s/yKdI0Sadgd/VKyd58nC75f61sH3c=; b=kY6HPVZ5+V/4ADSj5gTm7IQPleGSUbrDBXkArXIMxRp8wv39Fiygf+bF+YqLodOTpp AZsLSr13DzroNgqyc8IDfXzcQUL9DyFnAH4zuxHj+73S9iVBPAI2bhK64r7+IZ9iPw2j l6U45p4hnLVXIe3HFpHe4qJp9mEpOsCgXDx6xEMFZlJGN4bxSQMLkshnkvHzcWhTYszS 3RalZYOgTpQ3xR1RbA6oUKLNH9FzKB/Oq6geEXx6xglDHFlrVYGZZaZlqsNdHMvuOFaS qOuzTGhwZjX65TLhLW2uhuxlSD1GB251cNcp9nHh4Mb+0b/OHXDC3ll2brbPttKVPhId WuIg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=Kqez4sz2gXA92s/yKdI0Sadgd/VKyd58nC75f61sH3c=; b=KyQ1HssKgesP6p/a7EtwA0zD4ntUeLCtoub/BUFQgHwwJw0w0PryFcdzd+Z71XSuFo Fk2fbNr2bDS8WbeOclZeU1R3+fqyxJkeND3vaTwja/cSNULVbjYRKPYHWbHxhy79NPCv 3wbmFvfBGDyI4Eri9FYcxLH1v7L3+WUCtjsF/o+eB+MHxgEiP9QZVgZx5r4MhBr8slop /5STA2xFrlQEg4BvRwnnaeP8wLT4E+o80PRAH1ev210aQeDzkbNv9hJSfowJWKjW6pVa ZP3gdwrNMdV4KS3KGW90D0dtOUEX+ZarY9UUe4jSaJejEnU+Hh5uX+kOhg99bAmGMAsC eQ+Q== X-Gm-Message-State: AOAM533mu7bhLtmOx5mUW+up5hbDepYNSa10UUX0x8oIu/CD/H4zrHx6 zmvf6KTiMSASaEPO6Tl8rNnyWEk5JVAD99OhJDoYx8uo/3Ja2HuwGeTxpN6S2Pearvthya1Wpyi JVUF9HP8UQkYsS7RbeK0/OUSc8Y9WPNn3TtkXQG/0iDT+eglcUuq/lpL1OtArOQjoZDvuD4Hb X-Google-Smtp-Source: ABdhPJwRpDGu9uCaO7SUVy/qucopgnc8bRss4/HLiRU4Az2YJmA6Oz65Xuo4RY/0idM9yle+QHSTEPNKFl9M X-Received: from js-desktop.svl.corp.google.com ([2620:15c:2cd:202:ccbe:5d15:e2e6:322]) (user=junaids job=sendgmr) by 2002:a25:34c9:0:b0:623:fc5f:b98 with SMTP id b192-20020a2534c9000000b00623fc5f0b98mr27190113yba.195.1645593912355; Tue, 22 Feb 2022 21:25:12 -0800 (PST) Date: Tue, 22 Feb 2022 21:22:15 -0800 In-Reply-To: <20220223052223.1202152-1-junaids@google.com> Message-Id: <20220223052223.1202152-40-junaids@google.com> Mime-Version: 1.0 References: <20220223052223.1202152-1-junaids@google.com> X-Mailer: git-send-email 2.35.1.473.g83b2b277ed-goog Subject: [RFC PATCH 39/47] mm: asi: Skip conventional L1TF/MDS mitigations From: Junaid Shahid To: linux-kernel@vger.kernel.org Cc: Ofir Weisse , kvm@vger.kernel.org, pbonzini@redhat.com, jmattson@google.com, pjt@google.com, alexandre.chartre@oracle.com, rppt@linux.ibm.com, dave.hansen@linux.intel.com, peterz@infradead.org, tglx@linutronix.de, luto@kernel.org, linux-mm@kvack.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Ofir Weisse If ASI is enabled for an mm, then the L1D flushes and MDS mitigations will be taken care of ASI. We check if asi is enabled by checking current->mm->asi_enabled. To use ASI, a cgroup flag must be set before the VM process is forked - causing a flag mm->asi_enabled to be set. Signed-off-by: Ofir Weisse --- arch/x86/kvm/vmx/vmx.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index e0178b57be75..6549fef39f2b 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -6609,7 +6609,11 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_v= cpu *vcpu, =20 kvm_guest_enter_irqoff(); =20 - vmx_flush_sensitive_cpu_state(vcpu); + /* If Address Space Isolation is enabled, it will take care of L1D + * flushes, and will also mitigate MDS. In other words, if no ASI - + * flush sensitive cpu state. */ + if (!static_asi_enabled() || !mm_asi_enabled(current->mm)) + vmx_flush_sensitive_cpu_state(vcpu); =20 asi_enter(vcpu->kvm->asi); =20 --=20 2.35.1.473.g83b2b277ed-goog From nobody Fri Jun 26 13:44:01 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E9951C4332F for ; Wed, 23 Feb 2022 05:28:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238379AbiBWF2j (ORCPT ); Wed, 23 Feb 2022 00:28:39 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56914 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238422AbiBWF1R (ORCPT ); Wed, 23 Feb 2022 00:27:17 -0500 Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com [IPv6:2607:f8b0:4864:20::114a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2A51E6C1FE for ; Tue, 22 Feb 2022 21:25:24 -0800 (PST) Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-2d61f6c1877so163279377b3.15 for ; Tue, 22 Feb 2022 21:25:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=AJ9oF4MxpZqgNWeQmqV576xo41knbB79umnOaNmXddY=; b=gDjBd/x5dYNgFpVj+6AXYinaFyQ0y0nY9xCgZVbh0gj9MDv8hxw/demmgSf5+q0lIl f5u9TMyC094RJt+uGxErIIL6jEyzHxyof4Xb7WeyC6uYfiR8WIq5qp+TlkXog/JDvZk7 2mtgyW1bRYWB4o+N58ElEO1CyDNRbfSatixMLbQR1FnYuk0Gfj5u+Ct8tTOprcSTTI3D O9Kk/stLoRds2nPZbzZ/5aJHis4MCrOVxHj7FYqsiO1KWp6tTJxhsj5RQHjcB5DUXElF yIJbJ9KqvnvVAuRlFRGpGXt2ubrzQb0wSafap3WMiTiEiFUzbC7xRedGoDqwKzVo3u9S eXvA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=AJ9oF4MxpZqgNWeQmqV576xo41knbB79umnOaNmXddY=; b=wEycSYj/+hEkEsWchfbGj8KUYz2PwEG+dBVcAYVu6e5s3ylw3gcd0y464WXM9zxSWj LSE6cqS4z0qp4vtnXpyNQietCZA+WfJQkmZ69QhwrET/AAXkJ0kYBqys5b/o1W1WGzih 1Sir90ITJYzKJ0CtfRfF7j0nbYks1Z1CZdI5Qyla8v1xBwI0lxgXMlO/pj/eh5JEM2u+ sr6Cthq4KD6CI5LGohtCmHwdTSTwpDt9cTH4LzRJR27WE8Rxi/+Y3AWcugNZdbt8Zu6Y dyrxAnSJu6FlUGjC7V/qtfK7srYPKa429N6ducb3TYI8SvqlAQgWh40eg0waYSVXUAsj /wCQ== X-Gm-Message-State: AOAM531glZ7htlFRAKSNcwMtWGzjwQrvVOX0W3bU9CAmPjM02d0B5xf+ 4YFC13fuifi16VkuA8mGK7FNakdrvOSc92zmPr3LMpQikAU/BnBoGXZvgXfO+u0c08UAA3+J4Zj KC2KDVhmrynpicWM7ODEjySKvs872LHcbqyPL/uMPFduPzcCLycCdeUBEP171dK+PU41rRSU+ X-Google-Smtp-Source: ABdhPJxXcU94ggDYk55LTym0qr5/ST37jyZ7dcNnih2o73ZZ8KtzSOHKd++I3ohJEg/v0wMLNTTLtdFeGJBD X-Received: from js-desktop.svl.corp.google.com ([2620:15c:2cd:202:ccbe:5d15:e2e6:322]) (user=junaids job=sendgmr) by 2002:a81:3dcb:0:b0:2ca:72dd:904c with SMTP id k194-20020a813dcb000000b002ca72dd904cmr28454275ywa.290.1645593914646; Tue, 22 Feb 2022 21:25:14 -0800 (PST) Date: Tue, 22 Feb 2022 21:22:16 -0800 In-Reply-To: <20220223052223.1202152-1-junaids@google.com> Message-Id: <20220223052223.1202152-41-junaids@google.com> Mime-Version: 1.0 References: <20220223052223.1202152-1-junaids@google.com> X-Mailer: git-send-email 2.35.1.473.g83b2b277ed-goog Subject: [RFC PATCH 40/47] mm: asi: support for static percpu DEFINE_PER_CPU*_ASI From: Junaid Shahid To: linux-kernel@vger.kernel.org Cc: Ofir Weisse , kvm@vger.kernel.org, pbonzini@redhat.com, jmattson@google.com, pjt@google.com, alexandre.chartre@oracle.com, rppt@linux.ibm.com, dave.hansen@linux.intel.com, peterz@infradead.org, tglx@linutronix.de, luto@kernel.org, linux-mm@kvack.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Ofir Weisse Implemented the following PERCPU static declarations: - DECLARE/DEFINE_PER_CPU_ASI_NOT_SENSITIVE - DECLARE/DEFINE_PER_CPU_SHARED_ALIGNED_ASI_NOT_SENSITIVE - DECLARE/DEFINE_PER_CPU_ALIGNED_ASI_NOT_SENSITIVE - DECLARE/DEFINE_PER_CPU_PAGE_ALIGNED_ASI_NOT_SENSITIVE These definitions are also supported in dynamic modules. To support percpu variables in dynamic modules, we're creating an ASI pcpu reserved chunk. The reserved size PERCPU_MODULE_RESERVE is now split between the normal reserved chunk and the ASI one. Signed-off-by: Ofir Weisse --- arch/x86/mm/asi.c | 39 +++++++- include/asm-generic/percpu.h | 6 ++ include/asm-generic/vmlinux.lds.h | 5 + include/linux/module.h | 6 ++ include/linux/percpu-defs.h | 39 ++++++++ include/linux/percpu.h | 8 +- kernel/module-internal.h | 1 + kernel/module.c | 154 ++++++++++++++++++++++++++---- mm/percpu.c | 134 ++++++++++++++++++++++---- 9 files changed, 356 insertions(+), 36 deletions(-) diff --git a/arch/x86/mm/asi.c b/arch/x86/mm/asi.c index 6c14aa1fc4aa..ba373b461855 100644 --- a/arch/x86/mm/asi.c +++ b/arch/x86/mm/asi.c @@ -309,6 +309,32 @@ static int __init set_asi_param(char *str) } early_param("asi", set_asi_param); =20 +static int asi_map_percpu(struct asi *asi, void *percpu_addr, size_t len) +{ + int cpu, err; + void *ptr; + + for_each_possible_cpu(cpu) { + ptr =3D per_cpu_ptr(percpu_addr, cpu); + err =3D asi_map(asi, ptr, len); + if (err) + return err; + } + + return 0; +} + +static void asi_unmap_percpu(struct asi *asi, void *percpu_addr, size_t le= n) +{ + int cpu; + void *ptr; + + for_each_possible_cpu(cpu) { + ptr =3D per_cpu_ptr(percpu_addr, cpu); + asi_unmap(asi, ptr, len, true); + } +} + /* asi_load_module() is called from layout_and_allocate() in kernel/module= .c * We map the module and its data in init_mm.asi_pgd[0]. */ @@ -347,7 +373,13 @@ int asi_load_module(struct module* module) if (err) return err; =20 - return 0; + err =3D asi_map_percpu(ASI_GLOBAL_NONSENSITIVE, + module->percpu_asi, + module->percpu_asi_size ); + if (err) + return err; + + return 0; } EXPORT_SYMBOL_GPL(asi_load_module); =20 @@ -372,6 +404,9 @@ void asi_unload_module(struct module* module) module->core_layout.once_section_offset, module->core_layout.once_section_size, true); =20 + asi_unmap_percpu(ASI_GLOBAL_NONSENSITIVE, module->percpu_asi, + module->percpu_asi_size); + } =20 static int __init asi_global_init(void) @@ -399,6 +434,8 @@ static int __init asi_global_init(void) =20 static_branch_enable(&asi_local_map_initialized); =20 + pcpu_map_asi_reserved_chunk(); + return 0; } subsys_initcall(asi_global_init) diff --git a/include/asm-generic/percpu.h b/include/asm-generic/percpu.h index 6432a7fade91..40001b74114f 100644 --- a/include/asm-generic/percpu.h +++ b/include/asm-generic/percpu.h @@ -50,6 +50,12 @@ extern void setup_per_cpu_areas(void); =20 #endif /* SMP */ =20 +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION +void __init pcpu_map_asi_reserved_chunk(void); +#else +static inline void pcpu_map_asi_reserved_chunk(void) {} +#endif + #ifndef PER_CPU_BASE_SECTION #ifdef CONFIG_SMP #define PER_CPU_BASE_SECTION ".data..percpu" diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinu= x.lds.h index c769d939c15f..0a931aedc285 100644 --- a/include/asm-generic/vmlinux.lds.h +++ b/include/asm-generic/vmlinux.lds.h @@ -1080,6 +1080,11 @@ . =3D ALIGN(cacheline); \ *(.data..percpu) \ *(.data..percpu..shared_aligned) \ + . =3D ALIGN(PAGE_SIZE); \ + __per_cpu_asi_start =3D .; \ + *(.data..percpu..asi_non_sensitive) \ + . =3D ALIGN(PAGE_SIZE); \ + __per_cpu_asi_end =3D .; \ PERCPU_DECRYPTED_SECTION \ __per_cpu_end =3D .; =20 diff --git a/include/linux/module.h b/include/linux/module.h index 82267a95f936..d4d020bae171 100644 --- a/include/linux/module.h +++ b/include/linux/module.h @@ -463,6 +463,12 @@ struct module { /* Per-cpu data. */ void __percpu *percpu; unsigned int percpu_size; +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + /* Per-cpu data for ASI */ + void __percpu *percpu_asi; + unsigned int percpu_asi_size; +#endif /* CONFIG_ADDRESS_SPACE_ISOLATION */ + #endif void *noinstr_text_start; unsigned int noinstr_text_size; diff --git a/include/linux/percpu-defs.h b/include/linux/percpu-defs.h index af1071535de8..5d9fdc93e0fa 100644 --- a/include/linux/percpu-defs.h +++ b/include/linux/percpu-defs.h @@ -170,6 +170,45 @@ =20 #define DEFINE_PER_CPU_READ_MOSTLY(type, name) \ DEFINE_PER_CPU_SECTION(type, name, "..read_mostly") +/* + * Declaration/definition used for per-CPU variables which for the sake for + * address space isolation (ASI) are deemed not sensitive + */ +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION +#define ASI_PERCPU_SECTION "..asi_non_sensitive" +#else +#define ASI_PERCPU_SECTION "" +#endif + +#define DECLARE_PER_CPU_ASI_NOT_SENSITIVE(type, name) \ + DECLARE_PER_CPU_SECTION(type, name, ASI_PERCPU_SECTION) + +#define DECLARE_PER_CPU_SHARED_ALIGNED_ASI_NOT_SENSITIVE(type, name) \ + DECLARE_PER_CPU_SECTION(type, name, ASI_PERCPU_SECTION) \ + ____cacheline_aligned_in_smp + +#define DECLARE_PER_CPU_ALIGNED_ASI_NOT_SENSITIVE(type, name) \ + DECLARE_PER_CPU_SECTION(type, name, ASI_PERCPU_SECTION) \ + ____cacheline_aligned + +#define DECLARE_PER_CPU_PAGE_ALIGNED_ASI_NOT_SENSITIVE(type, name) \ + DECLARE_PER_CPU_SECTION(type, name, ASI_PERCPU_SECTION) \ + __aligned(PAGE_SIZE) + +#define DEFINE_PER_CPU_ASI_NOT_SENSITIVE(type, name) \ + DEFINE_PER_CPU_SECTION(type, name, ASI_PERCPU_SECTION) + +#define DEFINE_PER_CPU_SHARED_ALIGNED_ASI_NOT_SENSITIVE(type, name) \ + DEFINE_PER_CPU_SECTION(type, name, ASI_PERCPU_SECTION) \ + ____cacheline_aligned_in_smp + +#define DEFINE_PER_CPU_ALIGNED_ASI_NOT_SENSITIVE(type, name) \ + DEFINE_PER_CPU_SECTION(type, name, ASI_PERCPU_SECTION) \ + ____cacheline_aligned + +#define DEFINE_PER_CPU_PAGE_ALIGNED_ASI_NOT_SENSITIVE(type, name) \ + DEFINE_PER_CPU_SECTION(type, name, ASI_PERCPU_SECTION) \ + __aligned(PAGE_SIZE) =20 /* * Declaration/definition used for per-CPU variables that should be access= ed diff --git a/include/linux/percpu.h b/include/linux/percpu.h index ae4004e7957e..a2cc4c32cabd 100644 --- a/include/linux/percpu.h +++ b/include/linux/percpu.h @@ -13,7 +13,8 @@ =20 /* enough to cover all DEFINE_PER_CPUs in modules */ #ifdef CONFIG_MODULES -#define PERCPU_MODULE_RESERVE (8 << 10) +/* #define PERCPU_MODULE_RESERVE (8 << 10) */ +#define PERCPU_MODULE_RESERVE (16 << 10) #else #define PERCPU_MODULE_RESERVE 0 #endif @@ -123,6 +124,11 @@ extern int __init pcpu_page_first_chunk(size_t reserve= d_size, #endif =20 extern void __percpu *__alloc_reserved_percpu(size_t size, size_t align) _= _alloc_size(1); + +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION +extern void __percpu *__alloc_reserved_percpu_asi(size_t size, size_t alig= n); +#endif + extern bool __is_kernel_percpu_address(unsigned long addr, unsigned long *= can_addr); extern bool is_kernel_percpu_address(unsigned long addr); =20 diff --git a/kernel/module-internal.h b/kernel/module-internal.h index 33783abc377b..44c05ae06b2c 100644 --- a/kernel/module-internal.h +++ b/kernel/module-internal.h @@ -25,6 +25,7 @@ struct load_info { #endif struct { unsigned int sym, str, mod, vers, info, pcpu; + unsigned int pcpu_asi; } index; }; =20 diff --git a/kernel/module.c b/kernel/module.c index d363b8a0ee24..0048b7843903 100644 --- a/kernel/module.c +++ b/kernel/module.c @@ -587,6 +587,13 @@ static inline void __percpu *mod_percpu(struct module = *mod) return mod->percpu; } =20 +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION +static inline void __percpu *mod_percpu_asi(struct module *mod) +{ + return mod->percpu_asi; +} +#endif + static int percpu_modalloc(struct module *mod, struct load_info *info) { Elf_Shdr *pcpusec =3D &info->sechdrs[info->index.pcpu]; @@ -611,9 +618,34 @@ static int percpu_modalloc(struct module *mod, struct = load_info *info) return 0; } =20 +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION +static int percpu_asi_modalloc(struct module *mod, struct load_info *info) +{ + Elf_Shdr *pcpusec =3D &info->sechdrs[info->index.pcpu_asi]; + unsigned long align =3D pcpusec->sh_addralign; + + if ( !pcpusec->sh_size) + return 0; + + mod->percpu_asi =3D __alloc_reserved_percpu_asi(pcpusec->sh_size, align); + if (!mod->percpu_asi) { + pr_warn("%s: Could not allocate %lu bytes percpu data\n", + mod->name, (unsigned long)pcpusec->sh_size); + return -ENOMEM; + } + mod->percpu_asi_size =3D pcpusec->sh_size; + + return 0; +} +#endif + static void percpu_modfree(struct module *mod) { free_percpu(mod->percpu); + +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + free_percpu(mod->percpu_asi); +#endif } =20 static unsigned int find_pcpusec(struct load_info *info) @@ -621,6 +653,13 @@ static unsigned int find_pcpusec(struct load_info *inf= o) return find_sec(info, ".data..percpu"); } =20 +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION +static unsigned int find_pcpusec_asi(struct load_info *info) +{ + return find_sec(info, ".data..percpu" ASI_PERCPU_SECTION ); +} +#endif + static void percpu_modcopy(struct module *mod, const void *from, unsigned long size) { @@ -630,6 +669,39 @@ static void percpu_modcopy(struct module *mod, memcpy(per_cpu_ptr(mod->percpu, cpu), from, size); } =20 +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION +static void percpu_asi_modcopy(struct module *mod, + const void *from, unsigned long size) +{ + int cpu; + + for_each_possible_cpu(cpu) + memcpy(per_cpu_ptr(mod->percpu_asi, cpu), from, size); +} +#endif + +bool __is_module_percpu_address_helper(unsigned long addr, + unsigned long *can_addr, + unsigned int cpu, + void* percpu_start, + unsigned int percpu_size) +{ + void *start =3D per_cpu_ptr(percpu_start, cpu); + void *va =3D (void *)addr; + + if (va >=3D start && va < start + percpu_size) { + if (can_addr) { + *can_addr =3D (unsigned long) (va - start); + *can_addr +=3D (unsigned long) + per_cpu_ptr(percpu_start, + get_boot_cpu_id()); + } + return true; + } + + return false; +} + bool __is_module_percpu_address(unsigned long addr, unsigned long *can_add= r) { struct module *mod; @@ -640,22 +712,34 @@ bool __is_module_percpu_address(unsigned long addr, u= nsigned long *can_addr) list_for_each_entry_rcu(mod, &modules, list) { if (mod->state =3D=3D MODULE_STATE_UNFORMED) continue; +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + if (!mod->percpu_size && !mod->percpu_asi_size) + continue; +#else if (!mod->percpu_size) continue; +#endif for_each_possible_cpu(cpu) { - void *start =3D per_cpu_ptr(mod->percpu, cpu); - void *va =3D (void *)addr; - - if (va >=3D start && va < start + mod->percpu_size) { - if (can_addr) { - *can_addr =3D (unsigned long) (va - start); - *can_addr +=3D (unsigned long) - per_cpu_ptr(mod->percpu, - get_boot_cpu_id()); - } + if (__is_module_percpu_address_helper(addr, + can_addr, + cpu, + mod->percpu, + mod->percpu_= size)) { preempt_enable(); return true; - } + } + +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + if (__is_module_percpu_address_helper( + addr, + can_addr, + cpu, + mod->percpu_asi, + mod->percpu_asi_si= ze)) { + preempt_enable(); + return true; + } +#endif } } =20 @@ -2344,6 +2428,10 @@ static int simplify_symbols(struct module *mod, cons= t struct load_info *info) /* Divert to percpu allocation if a percpu var. */ if (sym[i].st_shndx =3D=3D info->index.pcpu) secbase =3D (unsigned long)mod_percpu(mod); +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + else if (sym[i].st_shndx =3D=3D info->index.pcpu_a= si) + secbase =3D (unsigned long)mod_percpu_asi(mod); +#endif else secbase =3D info->sechdrs[sym[i].st_shndx].sh_addr; sym[i].st_value +=3D secbase; @@ -2664,6 +2752,10 @@ static char elf_type(const Elf_Sym *sym, const struc= t load_info *info) return 'U'; if (sym->st_shndx =3D=3D SHN_ABS || sym->st_shndx =3D=3D info->index.pcpu) return 'a'; +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + if (sym->st_shndx =3D=3D info->index.pcpu_asi) + return 'a'; +#endif if (sym->st_shndx >=3D SHN_LORESERVE) return '?'; if (sechdrs[sym->st_shndx].sh_flags & SHF_EXECINSTR) @@ -2691,7 +2783,8 @@ static char elf_type(const Elf_Sym *sym, const struct= load_info *info) } =20 static bool is_core_symbol(const Elf_Sym *src, const Elf_Shdr *sechdrs, - unsigned int shnum, unsigned int pcpundx) + unsigned int shnum, unsigned int pcpundx, + unsigned pcpu_asi_ndx) { const Elf_Shdr *sec; =20 @@ -2701,7 +2794,7 @@ static bool is_core_symbol(const Elf_Sym *src, const = Elf_Shdr *sechdrs, return false; =20 #ifdef CONFIG_KALLSYMS_ALL - if (src->st_shndx =3D=3D pcpundx) + if (src->st_shndx =3D=3D pcpundx || src->st_shndx =3D=3D pcpu_asi_ndx ) return true; #endif =20 @@ -2743,7 +2836,7 @@ static void layout_symtab(struct module *mod, struct = load_info *info) for (ndst =3D i =3D 0; i < nsrc; i++) { if (i =3D=3D 0 || is_livepatch_module(mod) || is_core_symbol(src+i, info->sechdrs, info->hdr->e_shnum, - info->index.pcpu)) { + info->index.pcpu, info->index.pcpu_asi)) { strtab_size +=3D strlen(&info->strtab[src[i].st_name])+1; ndst++; } @@ -2807,7 +2900,7 @@ static void add_kallsyms(struct module *mod, const st= ruct load_info *info) mod->kallsyms->typetab[i] =3D elf_type(src + i, info); if (i =3D=3D 0 || is_livepatch_module(mod) || is_core_symbol(src+i, info->sechdrs, info->hdr->e_shnum, - info->index.pcpu)) { + info->index.pcpu, info->index.pcpu_asi)) { mod->core_kallsyms.typetab[ndst] =3D mod->kallsyms->typetab[i]; dst[ndst] =3D src[i]; @@ -3289,6 +3382,12 @@ static int setup_load_info(struct load_info *info, i= nt flags) =20 info->index.pcpu =3D find_pcpusec(info); =20 +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + info->index.pcpu_asi =3D find_pcpusec_asi(info); +#else + info->index.pcpu_asi =3D 0; +#endif + return 0; } =20 @@ -3629,6 +3728,12 @@ static struct module *layout_and_allocate(struct loa= d_info *info, int flags) /* We will do a special allocation for per-cpu sections later. */ info->sechdrs[info->index.pcpu].sh_flags &=3D ~(unsigned long)SHF_ALLOC; =20 +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + if (info->index.pcpu_asi) + info->sechdrs[info->index.pcpu_asi].sh_flags &=3D + ~(unsigned long)SHF_AL= LOC; +#endif + /* * Mark ro_after_init section with SHF_RO_AFTER_INIT so that * layout_sections() can put it in the right place. @@ -3700,6 +3805,14 @@ static int post_relocation(struct module *mod, const= struct load_info *info) percpu_modcopy(mod, (void *)info->sechdrs[info->index.pcpu].sh_addr, info->sechdrs[info->index.pcpu].sh_size); =20 +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + /* Copy relocated percpu ASI area over. */ + percpu_asi_modcopy( + mod, + (void *)info->sechdrs[info->index.pcpu_asi].sh_addr, + info->sechdrs[info->index.pcpu_asi].sh_size); +#endif + /* Setup kallsyms-specific fields. */ add_kallsyms(mod, info); =20 @@ -4094,6 +4207,11 @@ static int load_module(struct load_info *info, const= char __user *uargs, err =3D percpu_modalloc(mod, info); if (err) goto unlink_mod; +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + err =3D percpu_asi_modalloc(mod, info); + if (err) + goto unlink_mod; +#endif =20 /* Now module is in final location, initialize linked lists, etc. */ err =3D module_unload_init(mod); @@ -4183,7 +4301,11 @@ static int load_module(struct load_info *info, const= char __user *uargs, /* Get rid of temporary copy. */ free_copy(info); =20 - asi_load_module(mod); + err =3D asi_load_module(mod); + /* If the ASI loading failed, it doesn't necessarily mean that the + * module loading failed. We print an error and move on. */ + if (err) + pr_err("ASI: failed loading module %s", mod->name); =20 /* Done! */ trace_module_load(mod); diff --git a/mm/percpu.c b/mm/percpu.c index beaca5adf9d4..3665a5ea71ec 100644 --- a/mm/percpu.c +++ b/mm/percpu.c @@ -169,6 +169,10 @@ struct pcpu_chunk *pcpu_first_chunk __ro_after_init; */ struct pcpu_chunk *pcpu_reserved_chunk __ro_after_init; =20 +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION +struct pcpu_chunk *pcpu_reserved_nonsensitive_chunk __ro_after_init; +#endif + DEFINE_SPINLOCK(pcpu_lock); /* all internal data structures */ static DEFINE_MUTEX(pcpu_alloc_mutex); /* chunk create/destroy, [de]pop, m= ap ext */ =20 @@ -1621,6 +1625,11 @@ static struct pcpu_chunk *pcpu_chunk_addr_search(voi= d *addr) if (pcpu_addr_in_chunk(pcpu_first_chunk, addr)) return pcpu_first_chunk; =20 +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + /* is it in the reserved ASI region? */ + if (pcpu_addr_in_chunk(pcpu_reserved_nonsensitive_chunk, addr)) + return pcpu_reserved_nonsensitive_chunk; +#endif /* is it in the reserved region? */ if (pcpu_addr_in_chunk(pcpu_reserved_chunk, addr)) return pcpu_reserved_chunk; @@ -1805,23 +1814,37 @@ static void __percpu *pcpu_alloc(size_t size, size_= t align, bool reserved, =20 spin_lock_irqsave(&pcpu_lock, flags); =20 +#define TRY_ALLOC_FROM_CHUNK(source_chunk, chunk_name) \ +do { \ + if (!source_chunk) { \ + err =3D chunk_name " chunk not allocated"; \ + goto fail_unlock; \ + } \ + chunk =3D source_chunk; \ + \ + off =3D pcpu_find_block_fit(chunk, bits, bit_align, is_atomic); \ + if (off < 0) { \ + err =3D "alloc from " chunk_name " chunk failed"; \ + goto fail_unlock; \ + } \ + \ + off =3D pcpu_alloc_area(chunk, bits, bit_align, off); \ + if (off >=3D 0) \ + goto area_found; \ + \ + err =3D "alloc from " chunk_name " chunk failed"; \ + goto fail_unlock; \ +} while(0) + /* serve reserved allocations from the reserved chunk if available */ - if (reserved && pcpu_reserved_chunk) { - chunk =3D pcpu_reserved_chunk; - - off =3D pcpu_find_block_fit(chunk, bits, bit_align, is_atomic); - if (off < 0) { - err =3D "alloc from reserved chunk failed"; - goto fail_unlock; - } - - off =3D pcpu_alloc_area(chunk, bits, bit_align, off); - if (off >=3D 0) - goto area_found; - - err =3D "alloc from reserved chunk failed"; - goto fail_unlock; - } +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + if (reserved && (gfp & __GFP_GLOBAL_NONSENSITIVE)) + TRY_ALLOC_FROM_CHUNK(pcpu_reserved_nonsensitive_chunk, + "reserverved ASI"); + else +#endif + if (reserved && pcpu_reserved_chunk) + TRY_ALLOC_FROM_CHUNK(pcpu_reserved_chunk, "reserved"); =20 restart: /* search through normal chunks */ @@ -1998,6 +2021,14 @@ void __percpu *__alloc_reserved_percpu(size_t size, = size_t align) return pcpu_alloc(size, align, true, GFP_KERNEL); } =20 +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION +void __percpu *__alloc_reserved_percpu_asi(size_t size, size_t align) +{ + return pcpu_alloc(size, align, true, + GFP_KERNEL | __GFP_GLOBAL_NONSENSITIVE); +} +#endif + /** * pcpu_balance_free - manage the amount of free chunks * @empty_only: free chunks only if there are no populated pages @@ -2838,15 +2869,46 @@ void __init pcpu_setup_first_chunk(const struct pcp= u_alloc_info *ai, * the dynamic region. */ tmp_addr =3D (unsigned long)base_addr + static_size; +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + /* If ASI is used, split the reserved size between the nonsensitive + * chunk and the normal chunk evenly. */ + map_size =3D (ai->reserved_size / 2) ?: dyn_size; +#else map_size =3D ai->reserved_size ?: dyn_size; +#endif chunk =3D pcpu_alloc_first_chunk(tmp_addr, map_size); =20 /* init dynamic chunk if necessary */ if (ai->reserved_size) { - pcpu_reserved_chunk =3D chunk; +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + /* TODO: check if ASI was enabled via boot param or static branc= h */ + /* We allocated pcpu_reserved_nonsensitive_chunk only if + * pcpu_reserved_chunk is used as well. */ + pcpu_reserved_nonsensitive_chunk =3D chunk; + pcpu_reserved_nonsensitive_chunk->is_asi_nonsensitive =3D = true; =20 + /* We used the previous chunk as pcpu_reserved_nonsensitiv= e_chunk. Now + * allocate pcpu_reserved_chunk */ + tmp_addr =3D (unsigned long)base_addr + static_size + + (ai->reserved_size / 2); + map_size =3D ai->reserved_size / 2; + chunk =3D pcpu_alloc_first_chunk(tmp_addr, map_size); +#endif + /* Whether ASI is enabled or disabled, the end result is t= he + * same: + * If ASI is enabled, tmp_addr, used for pcpu_first_chunk = should + * be after + * 1. pcpu_reserved_nonsensitive_chunk AND + * 2. pcpu_reserved_chunk + * Since we split the reserve size in half, we skip in tot= al the + * whole ai->reserved_size. + * If ASI is disabled, tmp_addr, used for pcpu_first_chunk= is + * just after pcpu_reserved_chunk */ tmp_addr =3D (unsigned long)base_addr + static_size + ai->reserved_size; + + pcpu_reserved_chunk =3D chunk; + map_size =3D dyn_size; chunk =3D pcpu_alloc_first_chunk(tmp_addr, map_size); } @@ -3129,7 +3191,6 @@ int __init pcpu_embed_first_chunk(size_t reserved_siz= e, size_t dyn_size, cpu_distance_fn); if (IS_ERR(ai)) return PTR_ERR(ai); - size_sum =3D ai->static_size + ai->reserved_size + ai->dyn_size; areas_size =3D PFN_ALIGN(ai->nr_groups * sizeof(void *)); =20 @@ -3460,3 +3521,40 @@ static int __init percpu_enable_async(void) return 0; } subsys_initcall(percpu_enable_async); + +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION +void __init pcpu_map_asi_reserved_chunk(void) +{ + void *start_addr, *end_addr; + unsigned long map_start_addr, map_end_addr; + struct pcpu_chunk *chunk =3D pcpu_reserved_nonsensitive_chunk; + int err =3D 0; + + if (!chunk) + return; + + start_addr =3D chunk->base_addr + chunk->start_offset; + end_addr =3D chunk->base_addr + chunk->nr_pages * PAGE_SIZE - + chunk->end_offset; + + + /* No need in asi_map_percpu, since these addresses are "real". The + * chunk has full pages allocated, so we're not worried about leak= age of + * data caused by start_addr-->end_addr not being page aligned. as= i_map, + * however, will fail/crash if the addresses are not aligned. */ + map_start_addr =3D (unsigned long)start_addr & PAGE_MASK; + map_end_addr =3D PAGE_ALIGN((unsigned long)end_addr); + + pr_err("%s:%d mapping 0x%lx --> 0x%lx", + __FUNCTION__, __LINE__, map_start_addr, map_end_addr); + err =3D asi_map(ASI_GLOBAL_NONSENSITIVE, + (void*)map_start_addr, map_end_addr - map_start_addr= ); + + WARN(err, "Failed mapping percpu reserved chunk into ASI"); + + /* If we couldn't map the chuknk into ASI, it is useless. Set the = chunk + * to NULL, so allocations from it will fail. */ + if (err) + pcpu_reserved_nonsensitive_chunk =3D NULL; +} +#endif --=20 2.35.1.473.g83b2b277ed-goog From nobody Fri Jun 26 13:44:01 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9A435C433F5 for ; Wed, 23 Feb 2022 05:28:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238392AbiBWF2o (ORCPT ); Wed, 23 Feb 2022 00:28:44 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58106 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238373AbiBWF1R (ORCPT ); Wed, 23 Feb 2022 00:27:17 -0500 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1A4C26D4C8 for ; Tue, 22 Feb 2022 21:25:25 -0800 (PST) Received: by mail-yb1-xb4a.google.com with SMTP id b11-20020a5b008b000000b00624ea481d55so2533663ybp.19 for ; Tue, 22 Feb 2022 21:25:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=fjoegs+Ro+S0LiOfkKDHuccTriIwowqx+uIdOr3iex0=; b=EbHx41T8pBJ5A2BSG8qjfuD65qMmT14wI+rGz7MtVYSyZZTtXAQ9SSi490wWxvTCX3 lF1yqHJ25Ogc1hfXuKo9v+hg5YjvOasU7cMgAwTxB6T3rHmXhsMhhA64NTfNq1j1V9t4 TXGaFy+GztCl/Zph97U1y4UYZaT827j5iNNm3e+F0mQoJhUP0sknSp9qrgRx9gx8cRMf Kb0JwVGCNa+GNMz65PcJow71XH07p87xccnzELq309vUDOD82kYnpx/YluPwYOrivr7t AT7gGVWsZa9CQMSKMpbyZuKYicDY2/ciGXtsoA+y078oLzomCpgRmU1bPsX32QTmpazD udew== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=fjoegs+Ro+S0LiOfkKDHuccTriIwowqx+uIdOr3iex0=; b=MNT+7cOPZu0qpt5aPFUMkAkYrA9wQI40zZf9AhaXzypq9XZA7hPQjVIWTzIw74pdIF dt9gx080pHa5OjDIvpL2Wfiq/ikFaz5GbRZ9kTdIjhK5WntqLU/ogZHhl3vGGgMyEr+W ADo0EA4nTqwa9+pjJubjSgdXxZ8KMoI+2LN9g24tH+Yi/vQ9VocbAMNbBBfy78g26LaX tRLKvh1VktszlXAmos2hl5ziHG1RIzzbrF24Yu5tHOhB771RELIrXd4XMMpy/FTa+gDz D3AXytK+mFSRkwc8zvWwi17xgaSANeBnuI5JmaMArRLdyfFqZryn8OB+JXm/zoe/E1o2 tw4Q== X-Gm-Message-State: AOAM532vqnkUa7Tl7eymCJ9dEoZlNFNnIFt4V5m4Qd3eNEjUhBw7+J6K 3gzPKK8YtXj/9pLP4Dfxw/cmO1p3QhApd7F3ZnPICdZXQBZG2fsgSVwMJ2NY7xBmfQd0Jpd5QMP ueVthmWlaSP1gwyqDS0bgAtLoSFehGppj1ijkjpovhivzGaZzwB+V6gEx0FMOvBy9QR9vkoZh X-Google-Smtp-Source: ABdhPJyEpvnRe67yGfdP/oVogGTIG0cZjG1Fj1fU4LBOman77N8SUGlY7TMto2pQXc1VwXiknsbYEeXFLlaJ X-Received: from js-desktop.svl.corp.google.com ([2620:15c:2cd:202:ccbe:5d15:e2e6:322]) (user=junaids job=sendgmr) by 2002:a5b:cc8:0:b0:622:e87:2087 with SMTP id e8-20020a5b0cc8000000b006220e872087mr26256339ybr.106.1645593916871; Tue, 22 Feb 2022 21:25:16 -0800 (PST) Date: Tue, 22 Feb 2022 21:22:17 -0800 In-Reply-To: <20220223052223.1202152-1-junaids@google.com> Message-Id: <20220223052223.1202152-42-junaids@google.com> Mime-Version: 1.0 References: <20220223052223.1202152-1-junaids@google.com> X-Mailer: git-send-email 2.35.1.473.g83b2b277ed-goog Subject: [RFC PATCH 41/47] mm: asi: Annotation of static variables to be nonsensitive From: Junaid Shahid To: linux-kernel@vger.kernel.org Cc: Ofir Weisse , kvm@vger.kernel.org, pbonzini@redhat.com, jmattson@google.com, pjt@google.com, alexandre.chartre@oracle.com, rppt@linux.ibm.com, dave.hansen@linux.intel.com, peterz@infradead.org, tglx@linutronix.de, luto@kernel.org, linux-mm@kvack.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Ofir Weisse The heart of ASI is to diffrentiate between sensitive and non-sensitive data access. This commit marks certain static variables as not sensitive. Some static variables are accessed frequently and therefore would cause many ASI exits. The frequency of these accesses is monitored by tracing asi_exits and analyzing the accessed addresses. Many of these variables don't contain sensitive information and can therefore be mapped into the global ASI region. This commit applies the __asi_not_sensitive* attributes to these frequenmtly-accessed yet not sensitive variables. The end result is a very significant reduction in ASI exits on real benchmarks. Signed-off-by: Ofir Weisse --- arch/x86/events/core.c | 4 ++-- arch/x86/events/intel/core.c | 2 +- arch/x86/events/msr.c | 2 +- arch/x86/events/perf_event.h | 2 +- arch/x86/include/asm/kvm_host.h | 4 ++-- arch/x86/kernel/alternative.c | 2 +- arch/x86/kernel/cpu/bugs.c | 2 +- arch/x86/kernel/setup.c | 4 ++-- arch/x86/kernel/smp.c | 2 +- arch/x86/kernel/tsc.c | 8 +++---- arch/x86/kvm/lapic.c | 2 +- arch/x86/kvm/mmu/spte.c | 2 +- arch/x86/kvm/mmu/spte.h | 2 +- arch/x86/kvm/mtrr.c | 2 +- arch/x86/kvm/vmx/capabilities.h | 14 ++++++------ arch/x86/kvm/vmx/vmx.c | 37 ++++++++++++++++--------------- arch/x86/kvm/x86.c | 35 +++++++++++++++-------------- arch/x86/mm/asi.c | 4 ++-- include/linux/debug_locks.h | 4 ++-- include/linux/jiffies.h | 4 ++-- include/linux/notifier.h | 2 +- include/linux/profile.h | 2 +- include/linux/rcupdate.h | 4 +++- include/linux/rcutree.h | 2 +- include/linux/sched/sysctl.h | 1 + init/main.c | 2 +- kernel/cgroup/cgroup.c | 5 +++-- kernel/cpu.c | 14 ++++++------ kernel/events/core.c | 4 ++-- kernel/freezer.c | 2 +- kernel/locking/lockdep.c | 14 ++++++------ kernel/panic.c | 2 +- kernel/printk/printk.c | 4 ++-- kernel/profile.c | 4 ++-- kernel/rcu/tree.c | 10 ++++----- kernel/rcu/update.c | 4 ++-- kernel/sched/clock.c | 2 +- kernel/sched/core.c | 6 ++--- kernel/sched/cpuacct.c | 2 +- kernel/sched/cputime.c | 2 +- kernel/sched/fair.c | 4 ++-- kernel/sched/loadavg.c | 2 +- kernel/sched/rt.c | 2 +- kernel/sched/sched.h | 4 ++-- kernel/smp.c | 2 +- kernel/softirq.c | 3 ++- kernel/time/hrtimer.c | 2 +- kernel/time/jiffies.c | 8 ++++++- kernel/time/ntp.c | 30 ++++++++++++------------- kernel/time/tick-common.c | 4 ++-- kernel/time/tick-internal.h | 2 +- kernel/time/tick-sched.c | 2 +- kernel/time/timekeeping.c | 10 ++++----- kernel/time/timekeeping.h | 2 +- kernel/time/timer.c | 2 +- kernel/trace/trace.c | 2 +- kernel/trace/trace_sched_switch.c | 4 ++-- lib/debug_locks.c | 5 +++-- mm/memory.c | 2 +- mm/page_alloc.c | 2 +- mm/sparse.c | 4 ++-- virt/kvm/kvm_main.c | 2 +- 62 files changed, 170 insertions(+), 156 deletions(-) diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c index 38b2c779146f..db825bf053fd 100644 --- a/arch/x86/events/core.c +++ b/arch/x86/events/core.c @@ -44,7 +44,7 @@ =20 #include "perf_event.h" =20 -struct x86_pmu x86_pmu __read_mostly; +struct x86_pmu x86_pmu __asi_not_sensitive_readmostly; static struct pmu pmu; =20 DEFINE_PER_CPU(struct cpu_hw_events, cpu_hw_events) =3D { @@ -2685,7 +2685,7 @@ static int x86_pmu_filter_match(struct perf_event *ev= ent) return 1; } =20 -static struct pmu pmu =3D { +static struct pmu pmu __asi_not_sensitive =3D { .pmu_enable =3D x86_pmu_enable, .pmu_disable =3D x86_pmu_disable, =20 diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c index ec6444f2c9dc..5b2b7473b2f2 100644 --- a/arch/x86/events/intel/core.c +++ b/arch/x86/events/intel/core.c @@ -189,7 +189,7 @@ static struct event_constraint intel_slm_event_constrai= nts[] __read_mostly =3D EVENT_CONSTRAINT_END }; =20 -static struct event_constraint intel_skl_event_constraints[] =3D { +static struct event_constraint intel_skl_event_constraints[] __asi_not_sen= sitive =3D { FIXED_EVENT_CONSTRAINT(0x00c0, 0), /* INST_RETIRED.ANY */ FIXED_EVENT_CONSTRAINT(0x003c, 1), /* CPU_CLK_UNHALTED.CORE */ FIXED_EVENT_CONSTRAINT(0x0300, 2), /* CPU_CLK_UNHALTED.REF */ diff --git a/arch/x86/events/msr.c b/arch/x86/events/msr.c index 96c775abe31f..db7bca37c726 100644 --- a/arch/x86/events/msr.c +++ b/arch/x86/events/msr.c @@ -280,7 +280,7 @@ static int msr_event_add(struct perf_event *event, int = flags) return 0; } =20 -static struct pmu pmu_msr =3D { +static struct pmu pmu_msr __asi_not_sensitive =3D { .task_ctx_nr =3D perf_sw_context, .attr_groups =3D attr_groups, .event_init =3D msr_event_init, diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h index 5480db242083..27cca7fd6f17 100644 --- a/arch/x86/events/perf_event.h +++ b/arch/x86/events/perf_event.h @@ -1020,7 +1020,7 @@ static struct perf_pmu_format_hybrid_attr format_attr= _hybrid_##_name =3D {\ } =20 struct pmu *x86_get_pmu(unsigned int cpu); -extern struct x86_pmu x86_pmu __read_mostly; +extern struct x86_pmu x86_pmu __asi_not_sensitive_readmostly; =20 static __always_inline struct x86_perf_task_context_opt *task_context_opt(= void *ctx) { diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index 8ba88bbcf895..b7292c4fece7 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1542,8 +1542,8 @@ struct kvm_arch_async_pf { =20 extern u32 __read_mostly kvm_nr_uret_msrs; extern u64 __read_mostly host_efer; -extern bool __read_mostly allow_smaller_maxphyaddr; -extern bool __read_mostly enable_apicv; +extern bool __asi_not_sensitive_readmostly allow_smaller_maxphyaddr; +extern bool __asi_not_sensitive_readmostly enable_apicv; extern struct kvm_x86_ops kvm_x86_ops; =20 #define KVM_X86_OP(func) \ diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c index 23fb4d51a5da..9836ebe953ed 100644 --- a/arch/x86/kernel/alternative.c +++ b/arch/x86/kernel/alternative.c @@ -31,7 +31,7 @@ #include #include =20 -int __read_mostly alternatives_patched; +int __asi_not_sensitive alternatives_patched; =20 EXPORT_SYMBOL_GPL(alternatives_patched); =20 diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c index 1c1f218a701d..6b5e6574e391 100644 --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -46,7 +46,7 @@ static void __init srbds_select_mitigation(void); static void __init l1d_flush_select_mitigation(void); =20 /* The base value of the SPEC_CTRL MSR that always has to be preserved. */ -u64 x86_spec_ctrl_base; +u64 x86_spec_ctrl_base __asi_not_sensitive; EXPORT_SYMBOL_GPL(x86_spec_ctrl_base); static DEFINE_MUTEX(spec_ctrl_mutex); =20 diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index e04f5e6eb33f..d8461ac88b36 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -116,7 +116,7 @@ static struct resource bss_resource =3D { struct cpuinfo_x86 new_cpu_data; =20 /* Common CPU data for all CPUs */ -struct cpuinfo_x86 boot_cpu_data __read_mostly; +struct cpuinfo_x86 boot_cpu_data __asi_not_sensitive_readmostly; EXPORT_SYMBOL(boot_cpu_data); =20 unsigned int def_to_bigsmp; @@ -133,7 +133,7 @@ struct ist_info ist_info; #endif =20 #else -struct cpuinfo_x86 boot_cpu_data __read_mostly; +struct cpuinfo_x86 boot_cpu_data __asi_not_sensitive_readmostly; EXPORT_SYMBOL(boot_cpu_data); #endif =20 diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c index 06db901fabe8..e9e10ffc2ec2 100644 --- a/arch/x86/kernel/smp.c +++ b/arch/x86/kernel/smp.c @@ -257,7 +257,7 @@ static int __init nonmi_ipi_setup(char *str) =20 __setup("nonmi_ipi", nonmi_ipi_setup); =20 -struct smp_ops smp_ops =3D { +struct smp_ops smp_ops __asi_not_sensitive =3D { .smp_prepare_boot_cpu =3D native_smp_prepare_boot_cpu, .smp_prepare_cpus =3D native_smp_prepare_cpus, .smp_cpus_done =3D native_smp_cpus_done, diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c index a698196377be..d7169da99b01 100644 --- a/arch/x86/kernel/tsc.c +++ b/arch/x86/kernel/tsc.c @@ -30,10 +30,10 @@ #include #include =20 -unsigned int __read_mostly cpu_khz; /* TSC clocks / usec, not used here */ +unsigned int __asi_not_sensitive_readmostly cpu_khz; /* TSC clocks / usec,= not used here */ EXPORT_SYMBOL(cpu_khz); =20 -unsigned int __read_mostly tsc_khz; +unsigned int __asi_not_sensitive_readmostly tsc_khz; EXPORT_SYMBOL(tsc_khz); =20 #define KHZ 1000 @@ -41,7 +41,7 @@ EXPORT_SYMBOL(tsc_khz); /* * TSC can be unstable due to cpufreq or due to unsynced TSCs */ -static int __read_mostly tsc_unstable; +static int __asi_not_sensitive_readmostly tsc_unstable; static unsigned int __initdata tsc_early_khz; =20 static DEFINE_STATIC_KEY_FALSE(__use_tsc); @@ -1146,7 +1146,7 @@ static struct clocksource clocksource_tsc_early =3D { * this one will immediately take over. We will only register if TSC has * been found good. */ -static struct clocksource clocksource_tsc =3D { +static struct clocksource clocksource_tsc __asi_not_sensitive =3D { .name =3D "tsc", .rating =3D 300, .read =3D read_tsc, diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index f206fc35deff..213bbdfab49e 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -60,7 +60,7 @@ #define MAX_APIC_VECTOR 256 #define APIC_VECTORS_PER_REG 32 =20 -static bool lapic_timer_advance_dynamic __read_mostly; +static bool lapic_timer_advance_dynamic __asi_not_sensitive_readmostly; #define LAPIC_TIMER_ADVANCE_ADJUST_MIN 100 /* clock cycles */ #define LAPIC_TIMER_ADVANCE_ADJUST_MAX 10000 /* clock cycles */ #define LAPIC_TIMER_ADVANCE_NS_INIT 1000 diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c index 0c76c45fdb68..13038fae5088 100644 --- a/arch/x86/kvm/mmu/spte.c +++ b/arch/x86/kvm/mmu/spte.c @@ -33,7 +33,7 @@ u64 __read_mostly shadow_mmio_mask; u64 __read_mostly shadow_mmio_access_mask; u64 __read_mostly shadow_present_mask; u64 __read_mostly shadow_me_mask; -u64 __read_mostly shadow_acc_track_mask; +u64 __asi_not_sensitive_readmostly shadow_acc_track_mask; =20 u64 __read_mostly shadow_nonpresent_or_rsvd_mask; u64 __read_mostly shadow_nonpresent_or_rsvd_lower_gfn_mask; diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h index cc432f9a966b..d1af03f63009 100644 --- a/arch/x86/kvm/mmu/spte.h +++ b/arch/x86/kvm/mmu/spte.h @@ -151,7 +151,7 @@ extern u64 __read_mostly shadow_me_mask; * shadow_acc_track_mask is the set of bits to be cleared in non-accessed * pages. */ -extern u64 __read_mostly shadow_acc_track_mask; +extern u64 __asi_not_sensitive_readmostly shadow_acc_track_mask; =20 /* * This mask must be set on all non-zero Non-Present or Reserved SPTEs in = order diff --git a/arch/x86/kvm/mtrr.c b/arch/x86/kvm/mtrr.c index a8502e02f479..66228abfa9fa 100644 --- a/arch/x86/kvm/mtrr.c +++ b/arch/x86/kvm/mtrr.c @@ -138,7 +138,7 @@ struct fixed_mtrr_segment { int range_start; }; =20 -static struct fixed_mtrr_segment fixed_seg_table[] =3D { +static struct fixed_mtrr_segment fixed_seg_table[] __asi_not_sensitive =3D= { /* MSR_MTRRfix64K_00000, 1 unit. 64K fixed mtrr. */ { .start =3D 0x0, diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilitie= s.h index 4705ad55abb5..0ab03ec7d6d0 100644 --- a/arch/x86/kvm/vmx/capabilities.h +++ b/arch/x86/kvm/vmx/capabilities.h @@ -6,13 +6,13 @@ =20 #include "lapic.h" =20 -extern bool __read_mostly enable_vpid; -extern bool __read_mostly flexpriority_enabled; -extern bool __read_mostly enable_ept; -extern bool __read_mostly enable_unrestricted_guest; -extern bool __read_mostly enable_ept_ad_bits; -extern bool __read_mostly enable_pml; -extern int __read_mostly pt_mode; +extern bool __asi_not_sensitive_readmostly enable_vpid; +extern bool __asi_not_sensitive_readmostly flexpriority_enabled; +extern bool __asi_not_sensitive_readmostly enable_ept; +extern bool __asi_not_sensitive_readmostly enable_unrestricted_guest; +extern bool __asi_not_sensitive_readmostly enable_ept_ad_bits; +extern bool __asi_not_sensitive_readmostly enable_pml; +extern int __asi_not_sensitive_readmostly pt_mode; =20 #define PT_MODE_SYSTEM 0 #define PT_MODE_HOST_GUEST 1 diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 6549fef39f2b..e1ad82c25a78 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -78,29 +78,29 @@ static const struct x86_cpu_id vmx_cpu_id[] =3D { MODULE_DEVICE_TABLE(x86cpu, vmx_cpu_id); #endif =20 -bool __read_mostly enable_vpid =3D 1; +bool __asi_not_sensitive_readmostly enable_vpid =3D 1; module_param_named(vpid, enable_vpid, bool, 0444); =20 -static bool __read_mostly enable_vnmi =3D 1; +static bool __asi_not_sensitive_readmostly enable_vnmi =3D 1; module_param_named(vnmi, enable_vnmi, bool, S_IRUGO); =20 -bool __read_mostly flexpriority_enabled =3D 1; +bool __asi_not_sensitive_readmostly flexpriority_enabled =3D 1; module_param_named(flexpriority, flexpriority_enabled, bool, S_IRUGO); =20 -bool __read_mostly enable_ept =3D 1; +bool __asi_not_sensitive_readmostly enable_ept =3D 1; module_param_named(ept, enable_ept, bool, S_IRUGO); =20 -bool __read_mostly enable_unrestricted_guest =3D 1; +bool __asi_not_sensitive_readmostly enable_unrestricted_guest =3D 1; module_param_named(unrestricted_guest, enable_unrestricted_guest, bool, S_IRUGO); =20 -bool __read_mostly enable_ept_ad_bits =3D 1; +bool __asi_not_sensitive_readmostly enable_ept_ad_bits =3D 1; module_param_named(eptad, enable_ept_ad_bits, bool, S_IRUGO); =20 -static bool __read_mostly emulate_invalid_guest_state =3D true; +static bool __asi_not_sensitive_readmostly emulate_invalid_guest_state =3D= true; module_param(emulate_invalid_guest_state, bool, S_IRUGO); =20 -static bool __read_mostly fasteoi =3D 1; +static bool __asi_not_sensitive_readmostly fasteoi =3D 1; module_param(fasteoi, bool, S_IRUGO); =20 module_param(enable_apicv, bool, S_IRUGO); @@ -110,13 +110,13 @@ module_param(enable_apicv, bool, S_IRUGO); * VMX and be a hypervisor for its own guests. If nested=3D0, guests may n= ot * use VMX instructions. */ -static bool __read_mostly nested =3D 1; +static bool __asi_not_sensitive_readmostly nested =3D 1; module_param(nested, bool, S_IRUGO); =20 -bool __read_mostly enable_pml =3D 1; +bool __asi_not_sensitive_readmostly enable_pml =3D 1; module_param_named(pml, enable_pml, bool, S_IRUGO); =20 -static bool __read_mostly dump_invalid_vmcs =3D 0; +static bool __asi_not_sensitive_readmostly dump_invalid_vmcs =3D 0; module_param(dump_invalid_vmcs, bool, 0644); =20 #define MSR_BITMAP_MODE_X2APIC 1 @@ -125,13 +125,13 @@ module_param(dump_invalid_vmcs, bool, 0644); #define KVM_VMX_TSC_MULTIPLIER_MAX 0xffffffffffffffffULL =20 /* Guest_tsc -> host_tsc conversion requires 64-bit division. */ -static int __read_mostly cpu_preemption_timer_multi; -static bool __read_mostly enable_preemption_timer =3D 1; +static int __asi_not_sensitive_readmostly cpu_preemption_timer_multi; +static bool __asi_not_sensitive_readmostly enable_preemption_timer =3D 1; #ifdef CONFIG_X86_64 module_param_named(preemption_timer, enable_preemption_timer, bool, S_IRUG= O); #endif =20 -extern bool __read_mostly allow_smaller_maxphyaddr; +extern bool __asi_not_sensitive_readmostly allow_smaller_maxphyaddr; module_param(allow_smaller_maxphyaddr, bool, S_IRUGO); =20 #define KVM_VM_CR0_ALWAYS_OFF (X86_CR0_NW | X86_CR0_CD) @@ -202,7 +202,7 @@ static unsigned int ple_window_max =3D KVM_VMX_D= EFAULT_PLE_WINDOW_MAX; module_param(ple_window_max, uint, 0444); =20 /* Default is SYSTEM mode, 1 for host-guest mode */ -int __read_mostly pt_mode =3D PT_MODE_SYSTEM; +int __asi_not_sensitive_readmostly pt_mode =3D PT_MODE_SYSTEM; module_param(pt_mode, int, S_IRUGO); =20 static DEFINE_STATIC_KEY_FALSE(vmx_l1d_should_flush); @@ -421,7 +421,7 @@ static DEFINE_PER_CPU(struct list_head, loaded_vmcss_on= _cpu); static DECLARE_BITMAP(vmx_vpid_bitmap, VMX_NR_VPIDS); static DEFINE_SPINLOCK(vmx_vpid_lock); =20 -struct vmcs_config vmcs_config; +struct vmcs_config vmcs_config __asi_not_sensitive; struct vmx_capability vmx_capability; =20 #define VMX_SEGMENT_FIELD(seg) \ @@ -453,7 +453,7 @@ static inline void vmx_segment_cache_clear(struct vcpu_= vmx *vmx) vmx->segment_cache.bitmask =3D 0; } =20 -static unsigned long host_idt_base; +static unsigned long host_idt_base __asi_not_sensitive; =20 #if IS_ENABLED(CONFIG_HYPERV) static bool __read_mostly enlightened_vmcs =3D true; @@ -5549,7 +5549,8 @@ static int handle_bus_lock_vmexit(struct kvm_vcpu *vc= pu) * may resume. Otherwise they set the kvm_run parameter to indicate what = needs * to be done to userspace and return 0. */ -static int (*kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu) =3D { +static int (*kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu) __asi_not_sen= sitive +=3D { [EXIT_REASON_EXCEPTION_NMI] =3D handle_exception_nmi, [EXIT_REASON_EXTERNAL_INTERRUPT] =3D handle_external_interrupt, [EXIT_REASON_TRIPLE_FAULT] =3D handle_triple_fault, diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index d0df14deae80..0df88eadab60 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -123,7 +123,7 @@ static int sync_regs(struct kvm_vcpu *vcpu); static int __set_sregs2(struct kvm_vcpu *vcpu, struct kvm_sregs2 *sregs2); static void __get_sregs2(struct kvm_vcpu *vcpu, struct kvm_sregs2 *sregs2); =20 -struct kvm_x86_ops kvm_x86_ops __read_mostly; +struct kvm_x86_ops kvm_x86_ops __asi_not_sensitive_readmostly; EXPORT_SYMBOL_GPL(kvm_x86_ops); =20 #define KVM_X86_OP(func) \ @@ -148,17 +148,17 @@ module_param(min_timer_period_us, uint, S_IRUGO | S_I= WUSR); static bool __read_mostly kvmclock_periodic_sync =3D true; module_param(kvmclock_periodic_sync, bool, S_IRUGO); =20 -bool __read_mostly kvm_has_tsc_control; +bool __asi_not_sensitive_readmostly kvm_has_tsc_control; EXPORT_SYMBOL_GPL(kvm_has_tsc_control); -u32 __read_mostly kvm_max_guest_tsc_khz; +u32 __asi_not_sensitive_readmostly kvm_max_guest_tsc_khz; EXPORT_SYMBOL_GPL(kvm_max_guest_tsc_khz); -u8 __read_mostly kvm_tsc_scaling_ratio_frac_bits; +u8 __asi_not_sensitive_readmostly kvm_tsc_scaling_ratio_frac_bits; EXPORT_SYMBOL_GPL(kvm_tsc_scaling_ratio_frac_bits); -u64 __read_mostly kvm_max_tsc_scaling_ratio; +u64 __asi_not_sensitive_readmostly kvm_max_tsc_scaling_ratio; EXPORT_SYMBOL_GPL(kvm_max_tsc_scaling_ratio); -u64 __read_mostly kvm_default_tsc_scaling_ratio; +u64 __asi_not_sensitive_readmostly kvm_default_tsc_scaling_ratio; EXPORT_SYMBOL_GPL(kvm_default_tsc_scaling_ratio); -bool __read_mostly kvm_has_bus_lock_exit; +bool __asi_not_sensitive_readmostly kvm_has_bus_lock_exit; EXPORT_SYMBOL_GPL(kvm_has_bus_lock_exit); =20 /* tsc tolerance in parts per million - default to 1/2 of the NTP threshol= d */ @@ -171,20 +171,20 @@ module_param(tsc_tolerance_ppm, uint, S_IRUGO | S_IWU= SR); * advancement entirely. Any other value is used as-is and disables adapt= ive * tuning, i.e. allows privileged userspace to set an exact advancement ti= me. */ -static int __read_mostly lapic_timer_advance_ns =3D -1; +static int __asi_not_sensitive_readmostly lapic_timer_advance_ns =3D -1; module_param(lapic_timer_advance_ns, int, S_IRUGO | S_IWUSR); =20 -static bool __read_mostly vector_hashing =3D true; +static bool __asi_not_sensitive_readmostly vector_hashing =3D true; module_param(vector_hashing, bool, S_IRUGO); =20 -bool __read_mostly enable_vmware_backdoor =3D false; +bool __asi_not_sensitive_readmostly enable_vmware_backdoor =3D false; module_param(enable_vmware_backdoor, bool, S_IRUGO); EXPORT_SYMBOL_GPL(enable_vmware_backdoor); =20 -static bool __read_mostly force_emulation_prefix =3D false; +static bool __asi_not_sensitive_readmostly force_emulation_prefix =3D fals= e; module_param(force_emulation_prefix, bool, S_IRUGO); =20 -int __read_mostly pi_inject_timer =3D -1; +int __asi_not_sensitive_readmostly pi_inject_timer =3D -1; module_param(pi_inject_timer, bint, S_IRUGO | S_IWUSR); =20 /* @@ -216,13 +216,14 @@ static struct kvm_user_return_msrs __percpu *user_ret= urn_msrs; u64 __read_mostly host_efer; EXPORT_SYMBOL_GPL(host_efer); =20 -bool __read_mostly allow_smaller_maxphyaddr =3D 0; +bool __asi_not_sensitive_readmostly allow_smaller_maxphyaddr =3D 0; EXPORT_SYMBOL_GPL(allow_smaller_maxphyaddr); =20 -bool __read_mostly enable_apicv =3D true; +bool __asi_not_sensitive_readmostly enable_apicv =3D true; EXPORT_SYMBOL_GPL(enable_apicv); =20 -u64 __read_mostly host_xss; +/* TODO(oweisse): how dangerous is this variable, from a security standpoi= nt? */ +u64 __asi_not_sensitive_readmostly host_xss; EXPORT_SYMBOL_GPL(host_xss); u64 __read_mostly supported_xss; EXPORT_SYMBOL_GPL(supported_xss); @@ -292,7 +293,7 @@ const struct kvm_stats_header kvm_vcpu_stats_header =3D= { sizeof(kvm_vcpu_stats_desc), }; =20 -u64 __read_mostly host_xcr0; +u64 __asi_not_sensitive_readmostly host_xcr0; u64 __read_mostly supported_xcr0; EXPORT_SYMBOL_GPL(supported_xcr0); =20 @@ -2077,7 +2078,7 @@ struct pvclock_gtod_data { u64 wall_time_sec; }; =20 -static struct pvclock_gtod_data pvclock_gtod_data; +static struct pvclock_gtod_data pvclock_gtod_data __asi_not_sensitive; =20 static void update_pvclock_gtod(struct timekeeper *tk) { diff --git a/arch/x86/mm/asi.c b/arch/x86/mm/asi.c index ba373b461855..fdc117929fc7 100644 --- a/arch/x86/mm/asi.c +++ b/arch/x86/mm/asi.c @@ -17,8 +17,8 @@ #undef pr_fmt #define pr_fmt(fmt) "ASI: " fmt =20 -static struct asi_class asi_class[ASI_MAX_NUM]; -static DEFINE_SPINLOCK(asi_class_lock); +static struct asi_class asi_class[ASI_MAX_NUM] __asi_not_sensitive; +static DEFINE_SPINLOCK(asi_class_lock __asi_not_sensitive); =20 DEFINE_PER_CPU_ALIGNED(struct asi_state, asi_cpu_state); EXPORT_PER_CPU_SYMBOL_GPL(asi_cpu_state); diff --git a/include/linux/debug_locks.h b/include/linux/debug_locks.h index dbb409d77d4f..7bd0c3dd6d47 100644 --- a/include/linux/debug_locks.h +++ b/include/linux/debug_locks.h @@ -7,8 +7,8 @@ =20 struct task_struct; =20 -extern int debug_locks __read_mostly; -extern int debug_locks_silent __read_mostly; +extern int debug_locks; +extern int debug_locks_silent; =20 =20 static __always_inline int __debug_locks_off(void) diff --git a/include/linux/jiffies.h b/include/linux/jiffies.h index 5e13f801c902..deccab0dcb4a 100644 --- a/include/linux/jiffies.h +++ b/include/linux/jiffies.h @@ -76,8 +76,8 @@ extern int register_refined_jiffies(long clock_tick_rate); * without sampling the sequence number in jiffies_lock. * get_jiffies_64() will do this for you as appropriate. */ -extern u64 __cacheline_aligned_in_smp jiffies_64; -extern unsigned long volatile __cacheline_aligned_in_smp __jiffy_arch_data= jiffies; +extern u64 jiffies_64; +extern unsigned long volatile __jiffy_arch_data jiffies; =20 #if (BITS_PER_LONG < 64) u64 get_jiffies_64(void); diff --git a/include/linux/notifier.h b/include/linux/notifier.h index 87069b8459af..a27b193b8e60 100644 --- a/include/linux/notifier.h +++ b/include/linux/notifier.h @@ -117,7 +117,7 @@ extern void srcu_init_notifier_head(struct srcu_notifie= r_head *nh); struct blocking_notifier_head name =3D \ BLOCKING_NOTIFIER_INIT(name) #define RAW_NOTIFIER_HEAD(name) \ - struct raw_notifier_head name =3D \ + struct raw_notifier_head name __asi_not_sensitive =3D \ RAW_NOTIFIER_INIT(name) =20 #ifdef CONFIG_TREE_SRCU diff --git a/include/linux/profile.h b/include/linux/profile.h index fd18ca96f557..4988b6d05d4c 100644 --- a/include/linux/profile.h +++ b/include/linux/profile.h @@ -38,7 +38,7 @@ enum profile_type { =20 #ifdef CONFIG_PROFILING =20 -extern int prof_on __read_mostly; +extern int prof_on; =20 /* init basic kernel profiler */ int profile_init(void); diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h index 5e0beb5c5659..34f5073c88a2 100644 --- a/include/linux/rcupdate.h +++ b/include/linux/rcupdate.h @@ -84,7 +84,7 @@ static inline int rcu_preempt_depth(void) =20 /* Internal to kernel */ void rcu_init(void); -extern int rcu_scheduler_active __read_mostly; +extern int rcu_scheduler_active; void rcu_sched_clock_irq(int user); void rcu_report_dead(unsigned int cpu); void rcutree_migrate_callbacks(int cpu); @@ -308,6 +308,8 @@ static inline int rcu_read_lock_any_held(void) =20 #ifdef CONFIG_PROVE_RCU =20 +/* TODO: ASI - (oweisse) we might want to switch ".data.unlikely" to some = other + * section that will be mapped to ASI. */ /** * RCU_LOCKDEP_WARN - emit lockdep splat if specified condition is met * @c: condition to check diff --git a/include/linux/rcutree.h b/include/linux/rcutree.h index 53209d669400..76665db179fa 100644 --- a/include/linux/rcutree.h +++ b/include/linux/rcutree.h @@ -62,7 +62,7 @@ static inline void rcu_irq_exit_check_preempt(void) { } void exit_rcu(void); =20 void rcu_scheduler_starting(void); -extern int rcu_scheduler_active __read_mostly; +extern int rcu_scheduler_active; void rcu_end_inkernel_boot(void); bool rcu_inkernel_boot_has_ended(void); bool rcu_is_watching(void); diff --git a/include/linux/sched/sysctl.h b/include/linux/sched/sysctl.h index 304f431178fd..1529e3835939 100644 --- a/include/linux/sched/sysctl.h +++ b/include/linux/sched/sysctl.h @@ -3,6 +3,7 @@ #define _LINUX_SCHED_SYSCTL_H =20 #include +#include =20 struct ctl_table; =20 diff --git a/init/main.c b/init/main.c index bb984ed79de0..ce87fac83aed 100644 --- a/init/main.c +++ b/init/main.c @@ -123,7 +123,7 @@ extern void radix_tree_init(void); * operations which are not allowed with IRQ disabled are allowed while the * flag is set. */ -bool early_boot_irqs_disabled __read_mostly; +bool early_boot_irqs_disabled __asi_not_sensitive; =20 enum system_states system_state __read_mostly; EXPORT_SYMBOL(system_state); diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c index cafb8c114a21..729495e17363 100644 --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -162,7 +162,8 @@ static struct static_key_true *cgroup_subsys_on_dfl_key= [] =3D { static DEFINE_PER_CPU(struct cgroup_rstat_cpu, cgrp_dfl_root_rstat_cpu); =20 /* the default hierarchy */ -struct cgroup_root cgrp_dfl_root =3D { .cgrp.rstat_cpu =3D &cgrp_dfl_root_= rstat_cpu }; +struct cgroup_root cgrp_dfl_root __asi_not_sensitive =3D + { .cgrp.rstat_cpu =3D &cgrp_dfl_root_rstat_cpu }; EXPORT_SYMBOL_GPL(cgrp_dfl_root); =20 /* @@ -755,7 +756,7 @@ EXPORT_SYMBOL_GPL(of_css); * reference-counted, to improve performance when child cgroups * haven't been created. */ -struct css_set init_css_set =3D { +struct css_set init_css_set __asi_not_sensitive =3D { .refcount =3D REFCOUNT_INIT(1), .dom_cset =3D &init_css_set, .tasks =3D LIST_HEAD_INIT(init_css_set.tasks), diff --git a/kernel/cpu.c b/kernel/cpu.c index 407a2568f35e..59530bd5da39 100644 --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -2581,26 +2581,26 @@ const DECLARE_BITMAP(cpu_all_bits, NR_CPUS) =3D CPU= _BITS_ALL; EXPORT_SYMBOL(cpu_all_bits); =20 #ifdef CONFIG_INIT_ALL_POSSIBLE -struct cpumask __cpu_possible_mask __read_mostly +struct cpumask __cpu_possible_mask __asi_not_sensitive_readmostly =3D {CPU_BITS_ALL}; #else -struct cpumask __cpu_possible_mask __read_mostly; +struct cpumask __cpu_possible_mask __asi_not_sensitive_readmostly; #endif EXPORT_SYMBOL(__cpu_possible_mask); =20 -struct cpumask __cpu_online_mask __read_mostly; +struct cpumask __cpu_online_mask __asi_not_sensitive_readmostly; EXPORT_SYMBOL(__cpu_online_mask); =20 -struct cpumask __cpu_present_mask __read_mostly; +struct cpumask __cpu_present_mask __asi_not_sensitive_readmostly; EXPORT_SYMBOL(__cpu_present_mask); =20 -struct cpumask __cpu_active_mask __read_mostly; +struct cpumask __cpu_active_mask __asi_not_sensitive_readmostly; EXPORT_SYMBOL(__cpu_active_mask); =20 -struct cpumask __cpu_dying_mask __read_mostly; +struct cpumask __cpu_dying_mask __asi_not_sensitive_readmostly; EXPORT_SYMBOL(__cpu_dying_mask); =20 -atomic_t __num_online_cpus __read_mostly; +atomic_t __num_online_cpus __asi_not_sensitive_readmostly; EXPORT_SYMBOL(__num_online_cpus); =20 void init_cpu_present(const struct cpumask *src) diff --git a/kernel/events/core.c b/kernel/events/core.c index 30d94f68c5bd..6ea559b6e0f4 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -9651,7 +9651,7 @@ static int perf_swevent_init(struct perf_event *event) return 0; } =20 -static struct pmu perf_swevent =3D { +static struct pmu perf_swevent __asi_not_sensitive =3D { .task_ctx_nr =3D perf_sw_context, =20 .capabilities =3D PERF_PMU_CAP_NO_NMI, @@ -9800,7 +9800,7 @@ static int perf_tp_event_init(struct perf_event *even= t) return 0; } =20 -static struct pmu perf_tracepoint =3D { +static struct pmu perf_tracepoint __asi_not_sensitive =3D { .task_ctx_nr =3D perf_sw_context, =20 .event_init =3D perf_tp_event_init, diff --git a/kernel/freezer.c b/kernel/freezer.c index 45ab36ffd0e7..6ca163e4880b 100644 --- a/kernel/freezer.c +++ b/kernel/freezer.c @@ -13,7 +13,7 @@ #include =20 /* total number of freezing conditions in effect */ -atomic_t system_freezing_cnt =3D ATOMIC_INIT(0); +atomic_t __asi_not_sensitive system_freezing_cnt =3D ATOMIC_INIT(0); EXPORT_SYMBOL(system_freezing_cnt); =20 /* indicate whether PM freezing is in effect, protected by diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c index 2270ec68f10a..1b8f51a37883 100644 --- a/kernel/locking/lockdep.c +++ b/kernel/locking/lockdep.c @@ -64,7 +64,7 @@ #include =20 #ifdef CONFIG_PROVE_LOCKING -int prove_locking =3D 1; +int prove_locking __asi_not_sensitive =3D 1; module_param(prove_locking, int, 0644); #else #define prove_locking 0 @@ -186,8 +186,8 @@ unsigned long nr_zapped_classes; #ifndef CONFIG_DEBUG_LOCKDEP static #endif -struct lock_class lock_classes[MAX_LOCKDEP_KEYS]; -static DECLARE_BITMAP(lock_classes_in_use, MAX_LOCKDEP_KEYS); +struct lock_class lock_classes[MAX_LOCKDEP_KEYS] __asi_not_sensitive; +static DECLARE_BITMAP(lock_classes_in_use, MAX_LOCKDEP_KEYS) __asi_not_sen= sitive; =20 static inline struct lock_class *hlock_class(struct held_lock *hlock) { @@ -389,7 +389,7 @@ static struct hlist_head classhash_table[CLASSHASH_SIZE= ]; #define __chainhashfn(chain) hash_long(chain, CHAINHASH_BITS) #define chainhashentry(chain) (chainhash_table + __chainhashfn((chain))) =20 -static struct hlist_head chainhash_table[CHAINHASH_SIZE]; +static struct hlist_head chainhash_table[CHAINHASH_SIZE] __asi_not_sensiti= ve; =20 /* * the id of held_lock @@ -599,7 +599,7 @@ u64 lockdep_stack_hash_count(void) unsigned int nr_hardirq_chains; unsigned int nr_softirq_chains; unsigned int nr_process_chains; -unsigned int max_lockdep_depth; +unsigned int max_lockdep_depth __asi_not_sensitive; =20 #ifdef CONFIG_DEBUG_LOCKDEP /* @@ -3225,8 +3225,8 @@ check_prevs_add(struct task_struct *curr, struct held= _lock *next) return 0; } =20 -struct lock_chain lock_chains[MAX_LOCKDEP_CHAINS]; -static DECLARE_BITMAP(lock_chains_in_use, MAX_LOCKDEP_CHAINS); +struct lock_chain lock_chains[MAX_LOCKDEP_CHAINS] __asi_not_sensitive; +static DECLARE_BITMAP(lock_chains_in_use, MAX_LOCKDEP_CHAINS) __asi_not_se= nsitive; static u16 chain_hlocks[MAX_LOCKDEP_CHAIN_HLOCKS]; unsigned long nr_zapped_lock_chains; unsigned int nr_free_chain_hlocks; /* Free chain_hlocks in buckets */ diff --git a/kernel/panic.c b/kernel/panic.c index cefd7d82366f..6d0ee3ddd58b 100644 --- a/kernel/panic.c +++ b/kernel/panic.c @@ -56,7 +56,7 @@ int panic_on_warn __read_mostly; unsigned long panic_on_taint; bool panic_on_taint_nousertaint =3D false; =20 -int panic_timeout =3D CONFIG_PANIC_TIMEOUT; +int panic_timeout __asi_not_sensitive =3D CONFIG_PANIC_TIMEOUT; EXPORT_SYMBOL_GPL(panic_timeout); =20 #define PANIC_PRINT_TASK_INFO 0x00000001 diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index 57b132b658e1..3425fb1554d3 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -75,7 +75,7 @@ EXPORT_SYMBOL(ignore_console_lock_warning); * Low level drivers may need that to know if they can schedule in * their unblank() callback or not. So let's export it. */ -int oops_in_progress; +int oops_in_progress __asi_not_sensitive; EXPORT_SYMBOL(oops_in_progress); =20 /* @@ -2001,7 +2001,7 @@ static u8 *__printk_recursion_counter(void) local_irq_restore(flags); \ } while (0) =20 -int printk_delay_msec __read_mostly; +int printk_delay_msec __asi_not_sensitive_readmostly; =20 static inline void printk_delay(void) { diff --git a/kernel/profile.c b/kernel/profile.c index eb9c7f0f5ac5..c5beb9b0b0a8 100644 --- a/kernel/profile.c +++ b/kernel/profile.c @@ -44,10 +44,10 @@ static atomic_t *prof_buffer; static unsigned long prof_len; static unsigned short int prof_shift; =20 -int prof_on __read_mostly; +int prof_on __asi_not_sensitive_readmostly; EXPORT_SYMBOL_GPL(prof_on); =20 -static cpumask_var_t prof_cpu_mask; +static cpumask_var_t prof_cpu_mask __asi_not_sensitive; #if defined(CONFIG_SMP) && defined(CONFIG_PROC_FS) static DEFINE_PER_CPU(struct profile_hit *[2], cpu_profile_hits); static DEFINE_PER_CPU(int, cpu_profile_flip); diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index ef8d36f580fc..284d2722cf0c 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -82,7 +82,7 @@ static DEFINE_PER_CPU_SHARED_ALIGNED(struct rcu_data, rcu= _data) =3D { .cblist.flags =3D SEGCBLIST_SOFTIRQ_ONLY, #endif }; -static struct rcu_state rcu_state =3D { +static struct rcu_state rcu_state __asi_not_sensitive =3D { .level =3D { &rcu_state.node[0] }, .gp_state =3D RCU_GP_IDLE, .gp_seq =3D (0UL - 300UL) << RCU_SEQ_CTR_SHIFT, @@ -98,7 +98,7 @@ static struct rcu_state rcu_state =3D { static bool dump_tree; module_param(dump_tree, bool, 0444); /* By default, use RCU_SOFTIRQ instead of rcuc kthreads. */ -static bool use_softirq =3D !IS_ENABLED(CONFIG_PREEMPT_RT); +static __asi_not_sensitive bool use_softirq =3D !IS_ENABLED(CONFIG_PREEMPT= _RT); #ifndef CONFIG_PREEMPT_RT module_param(use_softirq, bool, 0444); #endif @@ -125,7 +125,7 @@ int rcu_num_nodes __read_mostly =3D NUM_RCU_NODES; /* T= otal # rcu_nodes in use. */ * transitions from RCU_SCHEDULER_INIT to RCU_SCHEDULER_RUNNING after RCU * is fully initialized, including all of its kthreads having been spawned. */ -int rcu_scheduler_active __read_mostly; +int rcu_scheduler_active __asi_not_sensitive; EXPORT_SYMBOL_GPL(rcu_scheduler_active); =20 /* @@ -140,7 +140,7 @@ EXPORT_SYMBOL_GPL(rcu_scheduler_active); * early boot to take responsibility for these callbacks, but one step at * a time. */ -static int rcu_scheduler_fully_active __read_mostly; +static int rcu_scheduler_fully_active __asi_not_sensitive; =20 static void rcu_report_qs_rnp(unsigned long mask, struct rcu_node *rnp, unsigned long gps, unsigned long flags); @@ -470,7 +470,7 @@ module_param(qovld, long, 0444); =20 static ulong jiffies_till_first_fqs =3D IS_ENABLED(CONFIG_RCU_STRICT_GRACE= _PERIOD) ? 0 : ULONG_MAX; static ulong jiffies_till_next_fqs =3D ULONG_MAX; -static bool rcu_kick_kthreads; +static bool rcu_kick_kthreads __asi_not_sensitive; static int rcu_divisor =3D 7; module_param(rcu_divisor, int, 0644); =20 diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c index 156892c22bb5..b61a3854e62d 100644 --- a/kernel/rcu/update.c +++ b/kernel/rcu/update.c @@ -243,7 +243,7 @@ core_initcall(rcu_set_runtime_mode); =20 #ifdef CONFIG_DEBUG_LOCK_ALLOC static struct lock_class_key rcu_lock_key; -struct lockdep_map rcu_lock_map =3D { +struct lockdep_map rcu_lock_map __asi_not_sensitive =3D { .name =3D "rcu_read_lock", .key =3D &rcu_lock_key, .wait_type_outer =3D LD_WAIT_FREE, @@ -494,7 +494,7 @@ EXPORT_SYMBOL_GPL(rcutorture_sched_setaffinity); #ifdef CONFIG_RCU_STALL_COMMON int rcu_cpu_stall_ftrace_dump __read_mostly; module_param(rcu_cpu_stall_ftrace_dump, int, 0644); -int rcu_cpu_stall_suppress __read_mostly; // !0 =3D suppress stall warning= s. +int rcu_cpu_stall_suppress __asi_not_sensitive_readmostly; // !0 =3D suppr= ess stall warnings. EXPORT_SYMBOL_GPL(rcu_cpu_stall_suppress); module_param(rcu_cpu_stall_suppress, int, 0644); int rcu_cpu_stall_timeout __read_mostly =3D CONFIG_RCU_CPU_STALL_TIMEOUT; diff --git a/kernel/sched/clock.c b/kernel/sched/clock.c index c2b2859ddd82..6c3585053f05 100644 --- a/kernel/sched/clock.c +++ b/kernel/sched/clock.c @@ -84,7 +84,7 @@ static int __sched_clock_stable_early =3D 1; /* * We want: ktime_get_ns() + __gtod_offset =3D=3D sched_clock() + __sched_= clock_offset */ -__read_mostly u64 __sched_clock_offset; +__asi_not_sensitive u64 __sched_clock_offset; static __read_mostly u64 __gtod_offset; =20 struct sched_clock_data { diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 44ea197c16ea..e1c08ff4130e 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -76,9 +76,9 @@ __read_mostly int sysctl_resched_latency_warn_once =3D 1; * Limited because this is done with IRQs disabled. */ #ifdef CONFIG_PREEMPT_RT -const_debug unsigned int sysctl_sched_nr_migrate =3D 8; +unsigned int sysctl_sched_nr_migrate __asi_not_sensitive_readmostly =3D 8; #else -const_debug unsigned int sysctl_sched_nr_migrate =3D 32; +unsigned int sysctl_sched_nr_migrate __asi_not_sensitive_readmostly =3D 32; #endif =20 /* @@ -9254,7 +9254,7 @@ int in_sched_functions(unsigned long addr) * Default task group. * Every task in system belongs to this group at bootup. */ -struct task_group root_task_group; +struct task_group root_task_group __asi_not_sensitive; LIST_HEAD(task_groups); =20 /* Cacheline aligned slab cache for task_group */ diff --git a/kernel/sched/cpuacct.c b/kernel/sched/cpuacct.c index 893eece65bfd..6e3da149125c 100644 --- a/kernel/sched/cpuacct.c +++ b/kernel/sched/cpuacct.c @@ -50,7 +50,7 @@ static inline struct cpuacct *parent_ca(struct cpuacct *c= a) } =20 static DEFINE_PER_CPU(struct cpuacct_usage, root_cpuacct_cpuusage); -static struct cpuacct root_cpuacct =3D { +static struct cpuacct root_cpuacct __asi_not_sensitive =3D { .cpustat =3D &kernel_cpustat, .cpuusage =3D &root_cpuacct_cpuusage, }; diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c index 9392aea1804e..623b5feb142a 100644 --- a/kernel/sched/cputime.c +++ b/kernel/sched/cputime.c @@ -19,7 +19,7 @@ */ DEFINE_PER_CPU(struct irqtime, cpu_irqtime); =20 -static int sched_clock_irqtime; +static int __asi_not_sensitive sched_clock_irqtime; =20 void enable_sched_clock_irqtime(void) { diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 6e476f6d9435..dc9b6133b059 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -35,7 +35,7 @@ * * (default: 6ms * (1 + ilog(ncpus)), units: nanoseconds) */ -unsigned int sysctl_sched_latency =3D 6000000ULL; +__asi_not_sensitive unsigned int sysctl_sched_latency =3D 6000000ULL; static unsigned int normalized_sysctl_sched_latency =3D 6000000ULL; =20 /* @@ -90,7 +90,7 @@ unsigned int sysctl_sched_child_runs_first __read_mostly; unsigned int sysctl_sched_wakeup_granularity =3D 1000000UL; static unsigned int normalized_sysctl_sched_wakeup_granularity =3D 1000000= UL; =20 -const_debug unsigned int sysctl_sched_migration_cost =3D 500000UL; +unsigned int sysctl_sched_migration_cost __asi_not_sensitive_readmostly = =3D 500000UL; =20 int sched_thermal_decay_shift; static int __init setup_sched_thermal_decay_shift(char *str) diff --git a/kernel/sched/loadavg.c b/kernel/sched/loadavg.c index 954b229868d9..af71cde93e98 100644 --- a/kernel/sched/loadavg.c +++ b/kernel/sched/loadavg.c @@ -57,7 +57,7 @@ =20 /* Variables and functions for calc_load */ atomic_long_t calc_load_tasks; -unsigned long calc_load_update; +unsigned long calc_load_update __asi_not_sensitive; unsigned long avenrun[3]; EXPORT_SYMBOL(avenrun); /* should be removed */ =20 diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index b48baaba2fc2..9d5fbe66d355 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -14,7 +14,7 @@ static const u64 max_rt_runtime =3D MAX_BW; =20 static int do_sched_rt_period_timer(struct rt_bandwidth *rt_b, int overrun= ); =20 -struct rt_bandwidth def_rt_bandwidth; +struct rt_bandwidth def_rt_bandwidth __asi_not_sensitive; =20 static enum hrtimer_restart sched_rt_period_timer(struct hrtimer *timer) { diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 0e66749486e7..517c70a29a57 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -2379,8 +2379,8 @@ extern void deactivate_task(struct rq *rq, struct tas= k_struct *p, int flags); =20 extern void check_preempt_curr(struct rq *rq, struct task_struct *p, int f= lags); =20 -extern const_debug unsigned int sysctl_sched_nr_migrate; -extern const_debug unsigned int sysctl_sched_migration_cost; +extern unsigned int sysctl_sched_nr_migrate; +extern unsigned int sysctl_sched_migration_cost; =20 #ifdef CONFIG_SCHED_DEBUG extern unsigned int sysctl_sched_latency; diff --git a/kernel/smp.c b/kernel/smp.c index 01a7c1706a58..c51fd981a4a9 100644 --- a/kernel/smp.c +++ b/kernel/smp.c @@ -1070,7 +1070,7 @@ static int __init maxcpus(char *str) early_param("maxcpus", maxcpus); =20 /* Setup number of possible processor ids */ -unsigned int nr_cpu_ids __read_mostly =3D NR_CPUS; +unsigned int nr_cpu_ids __asi_not_sensitive =3D NR_CPUS; EXPORT_SYMBOL(nr_cpu_ids); =20 /* An arch may set nr_cpu_ids earlier if needed, so this would be redundan= t */ diff --git a/kernel/softirq.c b/kernel/softirq.c index 41f470929e99..c462b7fab4d3 100644 --- a/kernel/softirq.c +++ b/kernel/softirq.c @@ -56,7 +56,8 @@ DEFINE_PER_CPU_ALIGNED(irq_cpustat_t, irq_stat); EXPORT_PER_CPU_SYMBOL(irq_stat); #endif =20 -static struct softirq_action softirq_vec[NR_SOFTIRQS] __cacheline_aligned_= in_smp; +static struct softirq_action softirq_vec[NR_SOFTIRQS] +__asi_not_sensitive ____cacheline_aligned; =20 DEFINE_PER_CPU(struct task_struct *, ksoftirqd); =20 diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c index 0ea8702eb516..8b176f5c01f2 100644 --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -706,7 +706,7 @@ hrtimer_force_reprogram(struct hrtimer_cpu_base *cpu_ba= se, int skip_equal) * High resolution timer enabled ? */ static bool hrtimer_hres_enabled __read_mostly =3D true; -unsigned int hrtimer_resolution __read_mostly =3D LOW_RES_NSEC; +unsigned int hrtimer_resolution __asi_not_sensitive =3D LOW_RES_NSEC; EXPORT_SYMBOL_GPL(hrtimer_resolution); =20 /* diff --git a/kernel/time/jiffies.c b/kernel/time/jiffies.c index bc4db9e5ab70..c60f8da1cfb5 100644 --- a/kernel/time/jiffies.c +++ b/kernel/time/jiffies.c @@ -40,7 +40,13 @@ static struct clocksource clocksource_jiffies =3D { .max_cycles =3D 10, }; =20 -__cacheline_aligned_in_smp DEFINE_RAW_SPINLOCK(jiffies_lock); +/* TODO(oweisse): __cacheline_aligned_in_smp is expanded to + __section__(".data..cacheline_aligned"))) which is at odds with + __asi_not_sensitive. We should consider instead using + __attribute__ ((__aligned__(XXX))) where XXX is a def for cacheline or + something*/ +/* __cacheline_aligned_in_smp */ +__asi_not_sensitive DEFINE_RAW_SPINLOCK(jiffies_lock); __cacheline_aligned_in_smp seqcount_raw_spinlock_t jiffies_seq =3D SEQCNT_RAW_SPINLOCK_ZERO(jiffies_seq, &jiffies_lock); =20 diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c index 406dccb79c2b..23711fb94323 100644 --- a/kernel/time/ntp.c +++ b/kernel/time/ntp.c @@ -31,13 +31,13 @@ =20 =20 /* USER_HZ period (usecs): */ -unsigned long tick_usec =3D USER_TICK_USEC; +unsigned long tick_usec __asi_not_sensitive =3D USER_TICK_USEC; =20 /* SHIFTED_HZ period (nsecs): */ -unsigned long tick_nsec; +unsigned long tick_nsec __asi_not_sensitive; =20 -static u64 tick_length; -static u64 tick_length_base; +static u64 tick_length __asi_not_sensitive; +static u64 tick_length_base __asi_not_sensitive; =20 #define SECS_PER_DAY 86400 #define MAX_TICKADJ 500LL /* usecs */ @@ -54,36 +54,36 @@ static u64 tick_length_base; * * (TIME_ERROR prevents overwriting the CMOS clock) */ -static int time_state =3D TIME_OK; +static int time_state __asi_not_sensitive =3D TIME_OK; =20 /* clock status bits: */ -static int time_status =3D STA_UNSYNC; +static int time_status __asi_not_sensitive =3D STA_UNSYNC; =20 /* time adjustment (nsecs): */ -static s64 time_offset; +static s64 time_offset __asi_not_sensitive; =20 /* pll time constant: */ -static long time_constant =3D 2; +static long time_constant __asi_not_sensitive =3D 2; =20 /* maximum error (usecs): */ -static long time_maxerror =3D NTP_PHASE_LIMIT; +static long time_maxerror __asi_not_sensitive =3D NTP_PHASE_LIMIT; =20 /* estimated error (usecs): */ -static long time_esterror =3D NTP_PHASE_LIMIT; +static long time_esterror __asi_not_sensitive =3D NTP_PHASE_LIMIT; =20 /* frequency offset (scaled nsecs/secs): */ -static s64 time_freq; +static s64 time_freq __asi_not_sensitive; =20 /* time at last adjustment (secs): */ -static time64_t time_reftime; +static time64_t time_reftime __asi_not_sensitive; =20 -static long time_adjust; +static long time_adjust __asi_not_sensitive; =20 /* constant (boot-param configurable) NTP tick adjustment (upscaled) */ -static s64 ntp_tick_adj; +static s64 ntp_tick_adj __asi_not_sensitive; =20 /* second value of the next pending leapsecond, or TIME64_MAX if no leap */ -static time64_t ntp_next_leap_sec =3D TIME64_MAX; +static time64_t ntp_next_leap_sec __asi_not_sensitive =3D TIME64_MAX; =20 #ifdef CONFIG_NTP_PPS =20 diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c index 46789356f856..cbe75661ca74 100644 --- a/kernel/time/tick-common.c +++ b/kernel/time/tick-common.c @@ -31,7 +31,7 @@ DEFINE_PER_CPU(struct tick_device, tick_cpu_device); * CPU which handles the tick and protected by jiffies_lock. There is * no requirement to write hold the jiffies seqcount for it. */ -ktime_t tick_next_period; +ktime_t tick_next_period __asi_not_sensitive; =20 /* * tick_do_timer_cpu is a timer core internal variable which holds the CPU= NR @@ -47,7 +47,7 @@ ktime_t tick_next_period; * at it will take over and keep the time keeping alive. The handover * procedure also covers cpu hotplug. */ -int tick_do_timer_cpu __read_mostly =3D TICK_DO_TIMER_BOOT; +int tick_do_timer_cpu __asi_not_sensitive_readmostly =3D TICK_DO_TIMER_BOO= T; #ifdef CONFIG_NO_HZ_FULL /* * tick_do_timer_boot_cpu indicates the boot CPU temporarily owns diff --git a/kernel/time/tick-internal.h b/kernel/time/tick-internal.h index 649f2b48e8f0..ed7e2a18060a 100644 --- a/kernel/time/tick-internal.h +++ b/kernel/time/tick-internal.h @@ -15,7 +15,7 @@ =20 DECLARE_PER_CPU(struct tick_device, tick_cpu_device); extern ktime_t tick_next_period; -extern int tick_do_timer_cpu __read_mostly; +extern int tick_do_timer_cpu; =20 extern void tick_setup_periodic(struct clock_event_device *dev, int broadc= ast); extern void tick_handle_periodic(struct clock_event_device *dev); diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index 17a283ce2b20..c23fecbb68c2 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -49,7 +49,7 @@ struct tick_sched *tick_get_tick_sched(int cpu) * jiffies_lock and jiffies_seq. tick_nohz_next_event() needs to get a * consistent view of jiffies and last_jiffies_update. */ -static ktime_t last_jiffies_update; +static ktime_t last_jiffies_update __asi_not_sensitive; =20 /* * Must be called with interrupts disabled ! diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index dcdcb85121e4..120395965e45 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -39,7 +39,7 @@ enum timekeeping_adv_mode { TK_ADV_FREQ }; =20 -DEFINE_RAW_SPINLOCK(timekeeper_lock); +__asi_not_sensitive DEFINE_RAW_SPINLOCK(timekeeper_lock); =20 /* * The most important data for readout fits into a single 64 byte @@ -48,14 +48,14 @@ DEFINE_RAW_SPINLOCK(timekeeper_lock); static struct { seqcount_raw_spinlock_t seq; struct timekeeper timekeeper; -} tk_core ____cacheline_aligned =3D { +} tk_core ____cacheline_aligned __asi_not_sensitive =3D { .seq =3D SEQCNT_RAW_SPINLOCK_ZERO(tk_core.seq, &timekeeper_lock), }; =20 -static struct timekeeper shadow_timekeeper; +static struct timekeeper shadow_timekeeper __asi_not_sensitive; =20 /* flag for if timekeeping is suspended */ -int __read_mostly timekeeping_suspended; +int __asi_not_sensitive_readmostly timekeeping_suspended; =20 /** * struct tk_fast - NMI safe timekeeper @@ -72,7 +72,7 @@ struct tk_fast { }; =20 /* Suspend-time cycles value for halted fast timekeeper. */ -static u64 cycles_at_suspend; +static u64 cycles_at_suspend __asi_not_sensitive; =20 static u64 dummy_clock_read(struct clocksource *cs) { diff --git a/kernel/time/timekeeping.h b/kernel/time/timekeeping.h index 543beba096c7..b32ee75808fe 100644 --- a/kernel/time/timekeeping.h +++ b/kernel/time/timekeeping.h @@ -26,7 +26,7 @@ extern void update_process_times(int user); extern void do_timer(unsigned long ticks); extern void update_wall_time(void); =20 -extern raw_spinlock_t jiffies_lock; +extern __asi_not_sensitive raw_spinlock_t jiffies_lock; extern seqcount_raw_spinlock_t jiffies_seq; =20 #define CS_NAME_LEN 32 diff --git a/kernel/time/timer.c b/kernel/time/timer.c index 85f1021ad459..0b09c99b568c 100644 --- a/kernel/time/timer.c +++ b/kernel/time/timer.c @@ -56,7 +56,7 @@ #define CREATE_TRACE_POINTS #include =20 -__visible u64 jiffies_64 __cacheline_aligned_in_smp =3D INITIAL_JIFFIES; +u64 jiffies_64 __asi_not_sensitive ____cacheline_aligned =3D INITIAL_JIFFI= ES; =20 EXPORT_SYMBOL(jiffies_64); =20 diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 78ea542ce3bc..eaec3814c5a4 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -432,7 +432,7 @@ EXPORT_SYMBOL_GPL(unregister_ftrace_export); * The global_trace is the descriptor that holds the top-level tracing * buffers for the live tracing. */ -static struct trace_array global_trace =3D { +static struct trace_array global_trace __asi_not_sensitive =3D { .trace_flags =3D TRACE_DEFAULT_FLAGS, }; =20 diff --git a/kernel/trace/trace_sched_switch.c b/kernel/trace/trace_sched_s= witch.c index e304196d7c28..d49db8e2430a 100644 --- a/kernel/trace/trace_sched_switch.c +++ b/kernel/trace/trace_sched_switch.c @@ -16,8 +16,8 @@ #define RECORD_CMDLINE 1 #define RECORD_TGID 2 =20 -static int sched_cmdline_ref; -static int sched_tgid_ref; +static int sched_cmdline_ref __asi_not_sensitive; +static int sched_tgid_ref __asi_not_sensitive; static DEFINE_MUTEX(sched_register_mutex); =20 static void diff --git a/lib/debug_locks.c b/lib/debug_locks.c index a75ee30b77cb..f2d217859be6 100644 --- a/lib/debug_locks.c +++ b/lib/debug_locks.c @@ -14,6 +14,7 @@ #include #include #include +#include =20 /* * We want to turn all lock-debugging facilities on/off at once, @@ -22,7 +23,7 @@ * that would just muddy the log. So we report the first one and * shut up after that. */ -int debug_locks __read_mostly =3D 1; +int debug_locks __asi_not_sensitive_readmostly =3D 1; EXPORT_SYMBOL_GPL(debug_locks); =20 /* @@ -30,7 +31,7 @@ EXPORT_SYMBOL_GPL(debug_locks); * 'silent failure': nothing is printed to the console when * a locking bug is detected. */ -int debug_locks_silent __read_mostly; +int debug_locks_silent __asi_not_sensitive_readmostly; EXPORT_SYMBOL_GPL(debug_locks_silent); =20 /* diff --git a/mm/memory.c b/mm/memory.c index 667ece86e051..5aa39d0aba2b 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -152,7 +152,7 @@ static int __init disable_randmaps(char *s) } __setup("norandmaps", disable_randmaps); =20 -unsigned long zero_pfn __read_mostly; +unsigned long zero_pfn __asi_not_sensitive; EXPORT_SYMBOL(zero_pfn); =20 unsigned long highest_memmap_pfn __read_mostly; diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 998ff6a56732..9c850b8bd1fc 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -183,7 +183,7 @@ unsigned long totalreserve_pages __read_mostly; unsigned long totalcma_pages __read_mostly; =20 int percpu_pagelist_high_fraction; -gfp_t gfp_allowed_mask __read_mostly =3D GFP_BOOT_MASK; +gfp_t gfp_allowed_mask __asi_not_sensitive_readmostly =3D GFP_BOOT_MASK; DEFINE_STATIC_KEY_MAYBE(CONFIG_INIT_ON_ALLOC_DEFAULT_ON, init_on_alloc); EXPORT_SYMBOL(init_on_alloc); =20 diff --git a/mm/sparse.c b/mm/sparse.c index e5c84b0cf0c9..64dcf7fceaed 100644 --- a/mm/sparse.c +++ b/mm/sparse.c @@ -24,10 +24,10 @@ * 1) mem_section - memory sections, mem_map's for valid memory */ #ifdef CONFIG_SPARSEMEM_EXTREME -struct mem_section **mem_section; +struct mem_section **mem_section __asi_not_sensitive; #else struct mem_section mem_section[NR_SECTION_ROOTS][SECTIONS_PER_ROOT] - ____cacheline_internodealigned_in_smp; + ____cacheline_internodealigned_in_smp __asi_not_sensitive; #endif EXPORT_SYMBOL(mem_section); =20 diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index e8e9c8588908..0af973b950c2 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -3497,7 +3497,7 @@ static int kvm_vcpu_release(struct inode *inode, stru= ct file *filp) return 0; } =20 -static struct file_operations kvm_vcpu_fops =3D { +static struct file_operations kvm_vcpu_fops __asi_not_sensitive =3D { .release =3D kvm_vcpu_release, .unlocked_ioctl =3D kvm_vcpu_ioctl, .mmap =3D kvm_vcpu_mmap, --=20 2.35.1.473.g83b2b277ed-goog From nobody Fri Jun 26 13:44:01 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 177F0C433EF for ; Wed, 23 Feb 2022 05:29:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238472AbiBWF3e (ORCPT ); Wed, 23 Feb 2022 00:29:34 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58764 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238438AbiBWF1T (ORCPT ); Wed, 23 Feb 2022 00:27:19 -0500 Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com [IPv6:2607:f8b0:4864:20::114a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E73926D4EE for ; Tue, 22 Feb 2022 21:25:26 -0800 (PST) Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-2d6914a097cso161865107b3.10 for ; Tue, 22 Feb 2022 21:25:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=OyDHTZ7Ac6r1LQol+lgAiZ5efN30RaXmeX7z0hPyuqA=; b=DdkDTnIiPn6TLdbXCmFvIo19iJFubFIhdETlSTTLRRjgZ4Sbw2XRK1yw+KEVLoro4T wP84t15CnH7WoNP9nOE1xCfFjeXiFihRdrvEeEaqbhIrww6MkpxMl0Beadw+zBAHg4Wm Y49K0SbIs0XAjHP26SPAuOOPd/FRd4AzW+548DKIXzxHOTL9Gnv2I7DesA74Y2aSN1w7 bukmgHnjTzQ3TsZeAD6EjoZj4bz08/67lWCDyTLOclgldihhQFgdX/f7KijewY11x+pe ztyxmT3V40T+w3mwFuSkO23/qUqxo0xoLiD6fHheWqBlYJ4PhhilLPTmelIkwEUqdEVv YRGg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=OyDHTZ7Ac6r1LQol+lgAiZ5efN30RaXmeX7z0hPyuqA=; b=SXr81Kjg3eU19K6LrtK/FP97WJu6HAq6xEvDh+WXtio021j82Z0ulPwH54x9xwOWrK bkurxGtfiJRWctSyVB3+yiHbe3/nGjRm3yQBLa3U6UTc+xhU9zGtMzadZY16ofP0H00+ ejCPGILwER3GsQFpki3RXmXXk3VRu+u0nqg4t8xWTccGP66unt/cJ28+qR4nq/4HoXfi I8S4+P7qT4OwQwv/hDovpQ6fYiPrT1o16H7lVmX6bRLyXjyWD/EccrWDkeR0cjAOr7oj HoYEerTO5E7RieENwg10nqjKc1wnAm8XJqklTGGGMzEfKwkXH9Nrb1eVNyubqoowUR43 gP9Q== X-Gm-Message-State: AOAM532KxWuOkueaiB30C3QIurer37enLcQcCmCHzhSjVPBzd7NNmC1M a/MdU/3yKkS2KrEEgG1Ss2tQnTv4a1AYLaunxkwtHO4dAyobBBUKrkf0ClU5u0jU5uwMPMKU9nZ 4MxPFQngl6unkZn1TsZkec6p8EmxkB+FpupNVBu5fPMEJhG5UJYt7ok4wlAoU8sI4tD+NGEJU X-Google-Smtp-Source: ABdhPJzJ67z80Uc/VNlFr/C8BlwDxiZI69VgqPgglpNrTFnLyw6IzyUmF2cWMgeuQwRsundlA5eSPH6P2ztJ X-Received: from js-desktop.svl.corp.google.com ([2620:15c:2cd:202:ccbe:5d15:e2e6:322]) (user=junaids job=sendgmr) by 2002:a81:e45:0:b0:2d6:bc2e:3f66 with SMTP id 66-20020a810e45000000b002d6bc2e3f66mr22941292ywo.54.1645593919291; Tue, 22 Feb 2022 21:25:19 -0800 (PST) Date: Tue, 22 Feb 2022 21:22:18 -0800 In-Reply-To: <20220223052223.1202152-1-junaids@google.com> Message-Id: <20220223052223.1202152-43-junaids@google.com> Mime-Version: 1.0 References: <20220223052223.1202152-1-junaids@google.com> X-Mailer: git-send-email 2.35.1.473.g83b2b277ed-goog Subject: [RFC PATCH 42/47] mm: asi: Annotation of PERCPU variables to be nonsensitive From: Junaid Shahid To: linux-kernel@vger.kernel.org Cc: Ofir Weisse , kvm@vger.kernel.org, pbonzini@redhat.com, jmattson@google.com, pjt@google.com, alexandre.chartre@oracle.com, rppt@linux.ibm.com, dave.hansen@linux.intel.com, peterz@infradead.org, tglx@linutronix.de, luto@kernel.org, linux-mm@kvack.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Ofir Weisse The heart of ASI is to diffrentiate between sensitive and non-sensitive data access. This commit marks certain static PERCPU variables as not sensitive. Some static variables are accessed frequently and therefore would cause many ASI exits. The frequency of these accesses is monitored by tracing asi_exits and analyzing the accessed addresses. Many of these variables don't contain sensitive information and can therefore be mapped into the global ASI region. This commit modified DEFINE_PER_CPU --> DEFINE_PER_CPU_ASI_NOT_SENSITIVE to variables which are frequenmtly-accessed yet not sensitive variables. The end result is a very significant reduction in ASI exits on real benchmarks. Signed-off-by: Ofir Weisse --- arch/x86/events/core.c | 2 +- arch/x86/events/intel/bts.c | 2 +- arch/x86/events/perf_event.h | 2 +- arch/x86/include/asm/asi.h | 2 +- arch/x86/include/asm/current.h | 2 +- arch/x86/include/asm/debugreg.h | 2 +- arch/x86/include/asm/desc.h | 2 +- arch/x86/include/asm/fpu/api.h | 2 +- arch/x86/include/asm/hardirq.h | 2 +- arch/x86/include/asm/hw_irq.h | 2 +- arch/x86/include/asm/percpu.h | 2 +- arch/x86/include/asm/preempt.h | 2 +- arch/x86/include/asm/processor.h | 12 ++++++------ arch/x86/include/asm/smp.h | 2 +- arch/x86/include/asm/tlbflush.h | 4 ++-- arch/x86/include/asm/topology.h | 2 +- arch/x86/kernel/apic/apic.c | 2 +- arch/x86/kernel/apic/x2apic_cluster.c | 6 +++--- arch/x86/kernel/cpu/common.c | 12 ++++++------ arch/x86/kernel/fpu/core.c | 2 +- arch/x86/kernel/hw_breakpoint.c | 2 +- arch/x86/kernel/irq.c | 2 +- arch/x86/kernel/irqinit.c | 2 +- arch/x86/kernel/nmi.c | 6 +++--- arch/x86/kernel/process.c | 4 ++-- arch/x86/kernel/setup_percpu.c | 4 ++-- arch/x86/kernel/smpboot.c | 3 ++- arch/x86/kernel/tsc.c | 2 +- arch/x86/kvm/x86.c | 2 +- arch/x86/kvm/x86.h | 2 +- arch/x86/mm/asi.c | 2 +- arch/x86/mm/init.c | 2 +- arch/x86/mm/tlb.c | 2 +- include/asm-generic/irq_regs.h | 2 +- include/linux/arch_topology.h | 2 +- include/linux/hrtimer.h | 2 +- include/linux/interrupt.h | 2 +- include/linux/kernel_stat.h | 4 ++-- include/linux/prandom.h | 2 +- kernel/events/core.c | 6 +++--- kernel/irq_work.c | 6 +++--- kernel/rcu/tree.c | 2 +- kernel/sched/core.c | 6 +++--- kernel/sched/cpufreq.c | 3 ++- kernel/sched/cputime.c | 2 +- kernel/sched/sched.h | 21 +++++++++++---------- kernel/sched/topology.c | 14 +++++++------- kernel/smp.c | 7 ++++--- kernel/softirq.c | 2 +- kernel/time/hrtimer.c | 2 +- kernel/time/tick-common.c | 2 +- kernel/time/tick-internal.h | 4 ++-- kernel/time/tick-sched.c | 2 +- kernel/time/timer.c | 2 +- kernel/trace/trace.c | 2 +- kernel/trace/trace_preemptirq.c | 2 +- kernel/watchdog.c | 12 ++++++------ lib/irq_regs.c | 2 +- lib/random32.c | 3 ++- virt/kvm/kvm_main.c | 2 +- 60 files changed, 112 insertions(+), 107 deletions(-) diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c index db825bf053fd..2d9829d774d7 100644 --- a/arch/x86/events/core.c +++ b/arch/x86/events/core.c @@ -47,7 +47,7 @@ struct x86_pmu x86_pmu __asi_not_sensitive_readmostly; static struct pmu pmu; =20 -DEFINE_PER_CPU(struct cpu_hw_events, cpu_hw_events) =3D { +DEFINE_PER_CPU_ASI_NOT_SENSITIVE(struct cpu_hw_events, cpu_hw_events) =3D { .enabled =3D 1, .pmu =3D &pmu, }; diff --git a/arch/x86/events/intel/bts.c b/arch/x86/events/intel/bts.c index 974e917e65b2..06d9de514b0d 100644 --- a/arch/x86/events/intel/bts.c +++ b/arch/x86/events/intel/bts.c @@ -36,7 +36,7 @@ enum { BTS_STATE_ACTIVE, }; =20 -static DEFINE_PER_CPU(struct bts_ctx, bts_ctx); +static DEFINE_PER_CPU_ASI_NOT_SENSITIVE(struct bts_ctx, bts_ctx); =20 #define BTS_RECORD_SIZE 24 #define BTS_SAFETY_MARGIN 4080 diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h index 27cca7fd6f17..9a4855e6ffa6 100644 --- a/arch/x86/events/perf_event.h +++ b/arch/x86/events/perf_event.h @@ -1036,7 +1036,7 @@ static inline bool x86_pmu_has_lbr_callstack(void) x86_pmu.lbr_sel_map[PERF_SAMPLE_BRANCH_CALL_STACK_SHIFT] > 0; } =20 -DECLARE_PER_CPU(struct cpu_hw_events, cpu_hw_events); +DECLARE_PER_CPU_ASI_NOT_SENSITIVE(struct cpu_hw_events, cpu_hw_events); =20 int x86_perf_event_set_period(struct perf_event *event); =20 diff --git a/arch/x86/include/asm/asi.h b/arch/x86/include/asm/asi.h index d43f6aadffee..6148e65fb0c2 100644 --- a/arch/x86/include/asm/asi.h +++ b/arch/x86/include/asm/asi.h @@ -52,7 +52,7 @@ struct asi_pgtbl_pool { uint count; }; =20 -DECLARE_PER_CPU_ALIGNED(struct asi_state, asi_cpu_state); +DECLARE_PER_CPU_ALIGNED_ASI_NOT_SENSITIVE(struct asi_state, asi_cpu_state); =20 extern pgd_t asi_global_nonsensitive_pgd[]; =20 diff --git a/arch/x86/include/asm/current.h b/arch/x86/include/asm/current.h index 3e204e6140b5..a4bcf1f305bf 100644 --- a/arch/x86/include/asm/current.h +++ b/arch/x86/include/asm/current.h @@ -8,7 +8,7 @@ #ifndef __ASSEMBLY__ struct task_struct; =20 -DECLARE_PER_CPU(struct task_struct *, current_task); +DECLARE_PER_CPU_ASI_NOT_SENSITIVE(struct task_struct *, current_task); =20 static __always_inline struct task_struct *get_current(void) { diff --git a/arch/x86/include/asm/debugreg.h b/arch/x86/include/asm/debugre= g.h index cfdf307ddc01..fa67db27b098 100644 --- a/arch/x86/include/asm/debugreg.h +++ b/arch/x86/include/asm/debugreg.h @@ -6,7 +6,7 @@ #include #include =20 -DECLARE_PER_CPU(unsigned long, cpu_dr7); +DECLARE_PER_CPU_ASI_NOT_SENSITIVE(unsigned long, cpu_dr7); =20 #ifndef CONFIG_PARAVIRT_XXL /* diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h index ab97b22ac04a..7d9fff8c9543 100644 --- a/arch/x86/include/asm/desc.h +++ b/arch/x86/include/asm/desc.h @@ -298,7 +298,7 @@ static inline void native_load_tls(struct thread_struct= *t, unsigned int cpu) gdt[GDT_ENTRY_TLS_MIN + i] =3D t->tls_array[i]; } =20 -DECLARE_PER_CPU(bool, __tss_limit_invalid); +DECLARE_PER_CPU_ASI_NOT_SENSITIVE(bool, __tss_limit_invalid); =20 static inline void force_reload_TR(void) { diff --git a/arch/x86/include/asm/fpu/api.h b/arch/x86/include/asm/fpu/api.h index 6f5ca3c2ef4a..15abb1b05fbc 100644 --- a/arch/x86/include/asm/fpu/api.h +++ b/arch/x86/include/asm/fpu/api.h @@ -121,7 +121,7 @@ static inline void fpstate_init_soft(struct swregs_stat= e *soft) {} #endif =20 /* State tracking */ -DECLARE_PER_CPU(struct fpu *, fpu_fpregs_owner_ctx); +DECLARE_PER_CPU_ASI_NOT_SENSITIVE(struct fpu *, fpu_fpregs_owner_ctx); =20 /* Process cleanup */ #ifdef CONFIG_X86_64 diff --git a/arch/x86/include/asm/hardirq.h b/arch/x86/include/asm/hardirq.h index 275e7fd20310..2f70deca4a20 100644 --- a/arch/x86/include/asm/hardirq.h +++ b/arch/x86/include/asm/hardirq.h @@ -46,7 +46,7 @@ typedef struct { #endif } ____cacheline_aligned irq_cpustat_t; =20 -DECLARE_PER_CPU_SHARED_ALIGNED(irq_cpustat_t, irq_stat); +DECLARE_PER_CPU_SHARED_ALIGNED_ASI_NOT_SENSITIVE(irq_cpustat_t, irq_stat); =20 #define __ARCH_IRQ_STAT =20 diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h index d465ece58151..e561abfce735 100644 --- a/arch/x86/include/asm/hw_irq.h +++ b/arch/x86/include/asm/hw_irq.h @@ -128,7 +128,7 @@ extern char spurious_entries_start[]; #define VECTOR_RETRIGGERED ((void *)-2L) =20 typedef struct irq_desc* vector_irq_t[NR_VECTORS]; -DECLARE_PER_CPU(vector_irq_t, vector_irq); +DECLARE_PER_CPU_ASI_NOT_SENSITIVE(vector_irq_t, vector_irq); =20 #endif /* !ASSEMBLY_ */ =20 diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h index a3c33b79fb86..f9486bbe8a76 100644 --- a/arch/x86/include/asm/percpu.h +++ b/arch/x86/include/asm/percpu.h @@ -390,7 +390,7 @@ static inline bool x86_this_cpu_variable_test_bit(int n= r, #include =20 /* We can use this directly for local CPU (faster). */ -DECLARE_PER_CPU_READ_MOSTLY(unsigned long, this_cpu_off); +DECLARE_PER_CPU_ASI_NOT_SENSITIVE(unsigned long, this_cpu_off); =20 #endif /* !__ASSEMBLY__ */ =20 diff --git a/arch/x86/include/asm/preempt.h b/arch/x86/include/asm/preempt.h index fe5efbcba824..204a8532b870 100644 --- a/arch/x86/include/asm/preempt.h +++ b/arch/x86/include/asm/preempt.h @@ -7,7 +7,7 @@ #include #include =20 -DECLARE_PER_CPU(int, __preempt_count); +DECLARE_PER_CPU_ASI_NOT_SENSITIVE(int, __preempt_count); =20 /* We use the MSB mostly because its available */ #define PREEMPT_NEED_RESCHED 0x80000000 diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/proces= sor.h index 20116efd2756..63831f9a503b 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -417,14 +417,14 @@ struct tss_struct { struct x86_io_bitmap io_bitmap; } __aligned(PAGE_SIZE); =20 -DECLARE_PER_CPU_PAGE_ALIGNED(struct tss_struct, cpu_tss_rw); +DECLARE_PER_CPU_PAGE_ALIGNED_ASI_NOT_SENSITIVE(struct tss_struct, cpu_tss_= rw); =20 /* Per CPU interrupt stacks */ struct irq_stack { char stack[IRQ_STACK_SIZE]; } __aligned(IRQ_STACK_SIZE); =20 -DECLARE_PER_CPU(unsigned long, cpu_current_top_of_stack); +DECLARE_PER_CPU_ASI_NOT_SENSITIVE(unsigned long, cpu_current_top_of_stack); =20 #ifdef CONFIG_X86_64 struct fixed_percpu_data { @@ -448,8 +448,8 @@ static inline unsigned long cpu_kernelmode_gs_base(int = cpu) return (unsigned long)per_cpu(fixed_percpu_data.gs_base, cpu); } =20 -DECLARE_PER_CPU(void *, hardirq_stack_ptr); -DECLARE_PER_CPU(bool, hardirq_stack_inuse); +DECLARE_PER_CPU_ASI_NOT_SENSITIVE(void *, hardirq_stack_ptr); +DECLARE_PER_CPU_ASI_NOT_SENSITIVE(bool, hardirq_stack_inuse); extern asmlinkage void ignore_sysret(void); =20 /* Save actual FS/GS selectors and bases to current->thread */ @@ -458,8 +458,8 @@ void current_save_fsgs(void); #ifdef CONFIG_STACKPROTECTOR DECLARE_PER_CPU(unsigned long, __stack_chk_guard); #endif -DECLARE_PER_CPU(struct irq_stack *, hardirq_stack_ptr); -DECLARE_PER_CPU(struct irq_stack *, softirq_stack_ptr); +DECLARE_PER_CPU_ASI_NOT_SENSITIVE(struct irq_stack *, hardirq_stack_ptr); +DECLARE_PER_CPU_ASI_NOT_SENSITIVE(struct irq_stack *, softirq_stack_ptr); #endif /* !X86_64 */ =20 struct perf_event; diff --git a/arch/x86/include/asm/smp.h b/arch/x86/include/asm/smp.h index 81a0211a372d..8d85a918532e 100644 --- a/arch/x86/include/asm/smp.h +++ b/arch/x86/include/asm/smp.h @@ -19,7 +19,7 @@ DECLARE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_llc_shared= _map); DECLARE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_l2c_shared_map); DECLARE_PER_CPU_READ_MOSTLY(u16, cpu_llc_id); DECLARE_PER_CPU_READ_MOSTLY(u16, cpu_l2c_id); -DECLARE_PER_CPU_READ_MOSTLY(int, cpu_number); +DECLARE_PER_CPU_ASI_NOT_SENSITIVE(int, cpu_number); =20 static inline struct cpumask *cpu_llc_shared_mask(int cpu) { diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflus= h.h index 7d04aa2a5f86..adcdeb58d817 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -151,7 +151,7 @@ struct tlb_state { */ struct tlb_context ctxs[TLB_NR_DYN_ASIDS]; }; -DECLARE_PER_CPU_ALIGNED(struct tlb_state, cpu_tlbstate); +DECLARE_PER_CPU_SHARED_ALIGNED_ASI_NOT_SENSITIVE(struct tlb_state, cpu_tlb= state); =20 struct tlb_state_shared { /* @@ -171,7 +171,7 @@ struct tlb_state_shared { */ bool is_lazy; }; -DECLARE_PER_CPU_SHARED_ALIGNED(struct tlb_state_shared, cpu_tlbstate_share= d); +DECLARE_PER_CPU_SHARED_ALIGNED_ASI_NOT_SENSITIVE(struct tlb_state_shared, = cpu_tlbstate_shared); =20 bool nmi_uaccess_okay(void); #define nmi_uaccess_okay nmi_uaccess_okay diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topolog= y.h index cc164777e661..bff1a9123469 100644 --- a/arch/x86/include/asm/topology.h +++ b/arch/x86/include/asm/topology.h @@ -203,7 +203,7 @@ DECLARE_STATIC_KEY_FALSE(arch_scale_freq_key); =20 #define arch_scale_freq_invariant() static_branch_likely(&arch_scale_freq_= key) =20 -DECLARE_PER_CPU(unsigned long, arch_freq_scale); +DECLARE_PER_CPU_ASI_NOT_SENSITIVE(unsigned long, arch_freq_scale); =20 static inline long arch_scale_freq_capacity(int cpu) { diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c index b70344bf6600..5fa0ce0ecfb3 100644 --- a/arch/x86/kernel/apic/apic.c +++ b/arch/x86/kernel/apic/apic.c @@ -548,7 +548,7 @@ static struct clock_event_device lapic_clockevent =3D { .rating =3D 100, .irq =3D -1, }; -static DEFINE_PER_CPU(struct clock_event_device, lapic_events); +static DEFINE_PER_CPU_ASI_NOT_SENSITIVE(struct clock_event_device, lapic_e= vents); =20 static const struct x86_cpu_id deadline_match[] __initconst =3D { X86_MATCH_INTEL_FAM6_MODEL_STEPPINGS(HASWELL_X, X86_STEPPINGS(0x2, 0x2), = 0x3a), /* EP */ diff --git a/arch/x86/kernel/apic/x2apic_cluster.c b/arch/x86/kernel/apic/x= 2apic_cluster.c index e696e22d0531..655fe820a240 100644 --- a/arch/x86/kernel/apic/x2apic_cluster.c +++ b/arch/x86/kernel/apic/x2apic_cluster.c @@ -20,10 +20,10 @@ struct cluster_mask { * x86_cpu_to_logical_apicid for all online cpus in a sequential way. * Using per cpu variable would cost one cache line per cpu. */ -static u32 *x86_cpu_to_logical_apicid __read_mostly; +static u32 *x86_cpu_to_logical_apicid __asi_not_sensitive_readmostly; =20 -static DEFINE_PER_CPU(cpumask_var_t, ipi_mask); -static DEFINE_PER_CPU_READ_MOSTLY(struct cluster_mask *, cluster_masks); +static DEFINE_PER_CPU_ASI_NOT_SENSITIVE(cpumask_var_t, ipi_mask); +static DEFINE_PER_CPU_ASI_NOT_SENSITIVE(struct cluster_mask *, cluster_mas= ks); static struct cluster_mask *cluster_hotplug_mask; =20 static int x2apic_acpi_madt_oem_check(char *oem_id, char *oem_table_id) diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index 0083464de5e3..471b3a42db64 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -1775,17 +1775,17 @@ EXPORT_PER_CPU_SYMBOL_GPL(fixed_percpu_data); * The following percpu variables are hot. Align current_task to * cacheline size such that they fall in the same cacheline. */ -DEFINE_PER_CPU(struct task_struct *, current_task) ____cacheline_aligned = =3D +DEFINE_PER_CPU_ASI_NOT_SENSITIVE(struct task_struct *, current_task) ____c= acheline_aligned =3D &init_task; EXPORT_PER_CPU_SYMBOL(current_task); =20 -DEFINE_PER_CPU(void *, hardirq_stack_ptr); -DEFINE_PER_CPU(bool, hardirq_stack_inuse); +DEFINE_PER_CPU_ASI_NOT_SENSITIVE(void *, hardirq_stack_ptr); +DEFINE_PER_CPU_ASI_NOT_SENSITIVE(bool, hardirq_stack_inuse); =20 -DEFINE_PER_CPU(int, __preempt_count) =3D INIT_PREEMPT_COUNT; +DEFINE_PER_CPU_ASI_NOT_SENSITIVE(int, __preempt_count) =3D INIT_PREEMPT_CO= UNT; EXPORT_PER_CPU_SYMBOL(__preempt_count); =20 -DEFINE_PER_CPU(unsigned long, cpu_current_top_of_stack) =3D TOP_OF_INIT_ST= ACK; +DEFINE_PER_CPU_ASI_NOT_SENSITIVE(unsigned long, cpu_current_top_of_stack) = =3D TOP_OF_INIT_STACK; =20 /* May not be marked __init: used by software suspend */ void syscall_init(void) @@ -1826,7 +1826,7 @@ void syscall_init(void) =20 #else /* CONFIG_X86_64 */ =20 -DEFINE_PER_CPU(struct task_struct *, current_task) =3D &init_task; +DEFINE_PER_CPU_ASI_NOT_SENSITIVE(struct task_struct *, current_task) =3D &= init_task; EXPORT_PER_CPU_SYMBOL(current_task); DEFINE_PER_CPU(int, __preempt_count) =3D INIT_PREEMPT_COUNT; EXPORT_PER_CPU_SYMBOL(__preempt_count); diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c index d7859573973d..b59317c5721f 100644 --- a/arch/x86/kernel/fpu/core.c +++ b/arch/x86/kernel/fpu/core.c @@ -57,7 +57,7 @@ static DEFINE_PER_CPU(bool, in_kernel_fpu); /* * Track which context is using the FPU on the CPU: */ -DEFINE_PER_CPU(struct fpu *, fpu_fpregs_owner_ctx); +DEFINE_PER_CPU_ASI_NOT_SENSITIVE(struct fpu *, fpu_fpregs_owner_ctx); =20 struct kmem_cache *fpstate_cachep; =20 diff --git a/arch/x86/kernel/hw_breakpoint.c b/arch/x86/kernel/hw_breakpoin= t.c index 668a4a6533d9..c2ceea8f6801 100644 --- a/arch/x86/kernel/hw_breakpoint.c +++ b/arch/x86/kernel/hw_breakpoint.c @@ -36,7 +36,7 @@ #include =20 /* Per cpu debug control register value */ -DEFINE_PER_CPU(unsigned long, cpu_dr7); +DEFINE_PER_CPU_ASI_NOT_SENSITIVE(unsigned long, cpu_dr7); EXPORT_PER_CPU_SYMBOL(cpu_dr7); =20 /* Per cpu debug address registers values */ diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c index 766ffe3ba313..5c5aa75050a5 100644 --- a/arch/x86/kernel/irq.c +++ b/arch/x86/kernel/irq.c @@ -26,7 +26,7 @@ #define CREATE_TRACE_POINTS #include =20 -DEFINE_PER_CPU_SHARED_ALIGNED(irq_cpustat_t, irq_stat); +DEFINE_PER_CPU_SHARED_ALIGNED_ASI_NOT_SENSITIVE(irq_cpustat_t, irq_stat); EXPORT_PER_CPU_SYMBOL(irq_stat); =20 atomic_t irq_err_count; diff --git a/arch/x86/kernel/irqinit.c b/arch/x86/kernel/irqinit.c index beb1bada1b0a..d7893e040695 100644 --- a/arch/x86/kernel/irqinit.c +++ b/arch/x86/kernel/irqinit.c @@ -46,7 +46,7 @@ * (these are usually mapped into the 0x30-0xff vector range) */ =20 -DEFINE_PER_CPU(vector_irq_t, vector_irq) =3D { +DEFINE_PER_CPU_ASI_NOT_SENSITIVE(vector_irq_t, vector_irq) =3D { [0 ... NR_VECTORS - 1] =3D VECTOR_UNUSED, }; =20 diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c index 4bce802d25fb..ef95071228ca 100644 --- a/arch/x86/kernel/nmi.c +++ b/arch/x86/kernel/nmi.c @@ -469,9 +469,9 @@ enum nmi_states { NMI_EXECUTING, NMI_LATCHED, }; -static DEFINE_PER_CPU(enum nmi_states, nmi_state); -static DEFINE_PER_CPU(unsigned long, nmi_cr2); -static DEFINE_PER_CPU(unsigned long, nmi_dr7); +static DEFINE_PER_CPU_ASI_NOT_SENSITIVE(enum nmi_states, nmi_state); +static DEFINE_PER_CPU_ASI_NOT_SENSITIVE(unsigned long, nmi_cr2); +static DEFINE_PER_CPU_ASI_NOT_SENSITIVE(unsigned long, nmi_dr7); =20 DEFINE_IDTENTRY_RAW(exc_nmi) { diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index f9bd1c3415d4..e4a32490dda0 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -56,7 +56,7 @@ * section. Since TSS's are completely CPU-local, we want them * on exact cacheline boundaries, to eliminate cacheline ping-pong. */ -__visible DEFINE_PER_CPU_PAGE_ALIGNED(struct tss_struct, cpu_tss_rw) =3D { +__visible DEFINE_PER_CPU_PAGE_ALIGNED_ASI_NOT_SENSITIVE(struct tss_struct,= cpu_tss_rw) =3D { .x86_tss =3D { /* * .sp0 is only used when entering ring 0 from a lower @@ -77,7 +77,7 @@ __visible DEFINE_PER_CPU_PAGE_ALIGNED(struct tss_struct, = cpu_tss_rw) =3D { }; EXPORT_PER_CPU_SYMBOL(cpu_tss_rw); =20 -DEFINE_PER_CPU(bool, __tss_limit_invalid); +DEFINE_PER_CPU_ASI_NOT_SENSITIVE(bool, __tss_limit_invalid); EXPORT_PER_CPU_SYMBOL_GPL(__tss_limit_invalid); =20 void __init arch_task_cache_init(void) diff --git a/arch/x86/kernel/setup_percpu.c b/arch/x86/kernel/setup_percpu.c index 7b65275544b2..13c94a512b7e 100644 --- a/arch/x86/kernel/setup_percpu.c +++ b/arch/x86/kernel/setup_percpu.c @@ -23,7 +23,7 @@ #include #include =20 -DEFINE_PER_CPU_READ_MOSTLY(int, cpu_number); +DEFINE_PER_CPU_ASI_NOT_SENSITIVE(int, cpu_number); EXPORT_PER_CPU_SYMBOL(cpu_number); =20 #ifdef CONFIG_X86_64 @@ -32,7 +32,7 @@ EXPORT_PER_CPU_SYMBOL(cpu_number); #define BOOT_PERCPU_OFFSET 0 #endif =20 -DEFINE_PER_CPU_READ_MOSTLY(unsigned long, this_cpu_off) =3D BOOT_PERCPU_OF= FSET; +DEFINE_PER_CPU_ASI_NOT_SENSITIVE(unsigned long, this_cpu_off) =3D BOOT_PER= CPU_OFFSET; EXPORT_PER_CPU_SYMBOL(this_cpu_off); =20 unsigned long __per_cpu_offset[NR_CPUS] __ro_after_init =3D { diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index 617012f4619f..0cfc4fdc2476 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -2224,7 +2224,8 @@ static void disable_freq_invariance_workfn(struct wor= k_struct *work) static DECLARE_WORK(disable_freq_invariance_work, disable_freq_invariance_workfn); =20 -DEFINE_PER_CPU(unsigned long, arch_freq_scale) =3D SCHED_CAPACITY_SCALE; +DEFINE_PER_CPU_ASI_NOT_SENSITIVE(unsigned long, arch_freq_scale) =3D + SCHED_CAPACITY_SCA= LE; =20 void arch_scale_freq_tick(void) { diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c index d7169da99b01..39c441409dec 100644 --- a/arch/x86/kernel/tsc.c +++ b/arch/x86/kernel/tsc.c @@ -59,7 +59,7 @@ struct cyc2ns { =20 }; /* fits one cacheline */ =20 -static DEFINE_PER_CPU_ALIGNED(struct cyc2ns, cyc2ns); +static DEFINE_PER_CPU_ALIGNED_ASI_NOT_SENSITIVE(struct cyc2ns, cyc2ns); =20 static int __init tsc_early_khz_setup(char *buf) { diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 0df88eadab60..451872d178e5 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -8523,7 +8523,7 @@ static void kvm_timer_init(void) kvmclock_cpu_online, kvmclock_cpu_down_prep); } =20 -DEFINE_PER_CPU(struct kvm_vcpu *, current_vcpu); +DEFINE_PER_CPU_ASI_NOT_SENSITIVE(struct kvm_vcpu *, current_vcpu); EXPORT_PER_CPU_SYMBOL_GPL(current_vcpu); =20 int kvm_is_in_guest(void) diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h index 4abcd8d9836d..3d5da4daaf53 100644 --- a/arch/x86/kvm/x86.h +++ b/arch/x86/kvm/x86.h @@ -392,7 +392,7 @@ static inline bool kvm_cstate_in_guest(struct kvm *kvm) return kvm->arch.cstate_in_guest; } =20 -DECLARE_PER_CPU(struct kvm_vcpu *, current_vcpu); +DECLARE_PER_CPU_ASI_NOT_SENSITIVE(struct kvm_vcpu *, current_vcpu); =20 static inline void kvm_before_interrupt(struct kvm_vcpu *vcpu) { diff --git a/arch/x86/mm/asi.c b/arch/x86/mm/asi.c index fdc117929fc7..04628949e89d 100644 --- a/arch/x86/mm/asi.c +++ b/arch/x86/mm/asi.c @@ -20,7 +20,7 @@ static struct asi_class asi_class[ASI_MAX_NUM] __asi_not_sensitive; static DEFINE_SPINLOCK(asi_class_lock __asi_not_sensitive); =20 -DEFINE_PER_CPU_ALIGNED(struct asi_state, asi_cpu_state); +DEFINE_PER_CPU_ALIGNED_ASI_NOT_SENSITIVE(struct asi_state, asi_cpu_state); EXPORT_PER_CPU_SYMBOL_GPL(asi_cpu_state); =20 __aligned(PAGE_SIZE) pgd_t asi_global_nonsensitive_pgd[PTRS_PER_PGD]; diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c index dfff17363365..012631d03c4f 100644 --- a/arch/x86/mm/init.c +++ b/arch/x86/mm/init.c @@ -1025,7 +1025,7 @@ void __init zone_sizes_init(void) free_area_init(max_zone_pfns); } =20 -__visible DEFINE_PER_CPU_ALIGNED(struct tlb_state, cpu_tlbstate) =3D { +__visible DEFINE_PER_CPU_SHARED_ALIGNED_ASI_NOT_SENSITIVE(struct tlb_state= , cpu_tlbstate) =3D { .loaded_mm =3D &init_mm, .next_asid =3D 1, .cr4 =3D ~0UL, /* fail hard if we screw up cr4 shadow initialization */ diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index fcd2c8e92f83..36d41356ed04 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -972,7 +972,7 @@ static bool tlb_is_not_lazy(int cpu) =20 static DEFINE_PER_CPU(cpumask_t, flush_tlb_mask); =20 -DEFINE_PER_CPU_SHARED_ALIGNED(struct tlb_state_shared, cpu_tlbstate_shared= ); +DEFINE_PER_CPU_SHARED_ALIGNED_ASI_NOT_SENSITIVE(struct tlb_state_shared, c= pu_tlbstate_shared); EXPORT_PER_CPU_SYMBOL(cpu_tlbstate_shared); =20 STATIC_NOPV void native_flush_tlb_multi(const struct cpumask *cpumask, diff --git a/include/asm-generic/irq_regs.h b/include/asm-generic/irq_regs.h index 2e7c6e89d42e..3225bdb2aefa 100644 --- a/include/asm-generic/irq_regs.h +++ b/include/asm-generic/irq_regs.h @@ -14,7 +14,7 @@ * Per-cpu current frame pointer - the location of the last exception fram= e on * the stack */ -DECLARE_PER_CPU(struct pt_regs *, __irq_regs); +DECLARE_PER_CPU_ASI_NOT_SENSITIVE(struct pt_regs *, __irq_regs); =20 static inline struct pt_regs *get_irq_regs(void) { diff --git a/include/linux/arch_topology.h b/include/linux/arch_topology.h index b97cea83b25e..35fdf256777a 100644 --- a/include/linux/arch_topology.h +++ b/include/linux/arch_topology.h @@ -23,7 +23,7 @@ static inline unsigned long topology_get_cpu_scale(int cp= u) =20 void topology_set_cpu_scale(unsigned int cpu, unsigned long capacity); =20 -DECLARE_PER_CPU(unsigned long, arch_freq_scale); +DECLARE_PER_CPU_ASI_NOT_SENSITIVE(unsigned long, arch_freq_scale); =20 static inline unsigned long topology_get_freq_scale(int cpu) { diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h index 0ee140176f10..68b2f10aaa46 100644 --- a/include/linux/hrtimer.h +++ b/include/linux/hrtimer.h @@ -355,7 +355,7 @@ static inline void timerfd_clock_was_set(void) { } static inline void timerfd_resume(void) { } #endif =20 -DECLARE_PER_CPU(struct tick_device, tick_cpu_device); +DECLARE_PER_CPU_ASI_NOT_SENSITIVE(struct tick_device, tick_cpu_device); =20 #ifdef CONFIG_PREEMPT_RT void hrtimer_cancel_wait_running(const struct hrtimer *timer); diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h index 1f22a30c0963..6ae485d2ebb3 100644 --- a/include/linux/interrupt.h +++ b/include/linux/interrupt.h @@ -554,7 +554,7 @@ extern void __raise_softirq_irqoff(unsigned int nr); extern void raise_softirq_irqoff(unsigned int nr); extern void raise_softirq(unsigned int nr); =20 -DECLARE_PER_CPU(struct task_struct *, ksoftirqd); +DECLARE_PER_CPU_ASI_NOT_SENSITIVE(struct task_struct *, ksoftirqd); =20 static inline struct task_struct *this_cpu_ksoftirqd(void) { diff --git a/include/linux/kernel_stat.h b/include/linux/kernel_stat.h index 69ae6b278464..89609dc5d30f 100644 --- a/include/linux/kernel_stat.h +++ b/include/linux/kernel_stat.h @@ -40,8 +40,8 @@ struct kernel_stat { unsigned int softirqs[NR_SOFTIRQS]; }; =20 -DECLARE_PER_CPU(struct kernel_stat, kstat); -DECLARE_PER_CPU(struct kernel_cpustat, kernel_cpustat); +DECLARE_PER_CPU_ASI_NOT_SENSITIVE(struct kernel_stat, kstat); +DECLARE_PER_CPU_ASI_NOT_SENSITIVE(struct kernel_cpustat, kernel_cpustat); =20 /* Must have preemption disabled for this to be meaningful. */ #define kstat_this_cpu this_cpu_ptr(&kstat) diff --git a/include/linux/prandom.h b/include/linux/prandom.h index 056d31317e49..f02392ca6dc2 100644 --- a/include/linux/prandom.h +++ b/include/linux/prandom.h @@ -16,7 +16,7 @@ void prandom_bytes(void *buf, size_t nbytes); void prandom_seed(u32 seed); void prandom_reseed_late(void); =20 -DECLARE_PER_CPU(unsigned long, net_rand_noise); +DECLARE_PER_CPU_ASI_NOT_SENSITIVE(unsigned long, net_rand_noise); =20 #define PRANDOM_ADD_NOISE(a, b, c, d) \ prandom_u32_add_noise((unsigned long)(a), (unsigned long)(b), \ diff --git a/kernel/events/core.c b/kernel/events/core.c index 6ea559b6e0f4..1914cc538cab 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -1207,7 +1207,7 @@ void perf_pmu_enable(struct pmu *pmu) pmu->pmu_enable(pmu); } =20 -static DEFINE_PER_CPU(struct list_head, active_ctx_list); +static DEFINE_PER_CPU_ASI_NOT_SENSITIVE(struct list_head, active_ctx_list); =20 /* * perf_event_ctx_activate(), perf_event_ctx_deactivate(), and @@ -4007,8 +4007,8 @@ do { \ return div64_u64(dividend, divisor); } =20 -static DEFINE_PER_CPU(int, perf_throttled_count); -static DEFINE_PER_CPU(u64, perf_throttled_seq); +static DEFINE_PER_CPU_ASI_NOT_SENSITIVE(int, perf_throttled_count); +static DEFINE_PER_CPU_ASI_NOT_SENSITIVE(u64, perf_throttled_seq); =20 static void perf_adjust_period(struct perf_event *event, u64 nsec, u64 cou= nt, bool disable) { diff --git a/kernel/irq_work.c b/kernel/irq_work.c index f7df715ec28e..10df3577c733 100644 --- a/kernel/irq_work.c +++ b/kernel/irq_work.c @@ -22,9 +22,9 @@ #include #include =20 -static DEFINE_PER_CPU(struct llist_head, raised_list); -static DEFINE_PER_CPU(struct llist_head, lazy_list); -static DEFINE_PER_CPU(struct task_struct *, irq_workd); +static DEFINE_PER_CPU_ASI_NOT_SENSITIVE(struct llist_head, raised_list); +static DEFINE_PER_CPU_ASI_NOT_SENSITIVE(struct llist_head, lazy_list); +static DEFINE_PER_CPU_ASI_NOT_SENSITIVE(struct task_struct *, irq_workd); =20 static void wake_irq_workd(void) { diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 284d2722cf0c..aee2b6994bc2 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -74,7 +74,7 @@ =20 /* Data structures. */ =20 -static DEFINE_PER_CPU_SHARED_ALIGNED(struct rcu_data, rcu_data) =3D { +static DEFINE_PER_CPU_SHARED_ALIGNED_ASI_NOT_SENSITIVE(struct rcu_data, rc= u_data) =3D { .dynticks_nesting =3D 1, .dynticks_nmi_nesting =3D DYNTICK_IRQ_NONIDLE, .dynticks =3D ATOMIC_INIT(1), diff --git a/kernel/sched/core.c b/kernel/sched/core.c index e1c08ff4130e..7c96f0001c7f 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -43,7 +43,7 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(sched_util_est_cfs_tp); EXPORT_TRACEPOINT_SYMBOL_GPL(sched_util_est_se_tp); EXPORT_TRACEPOINT_SYMBOL_GPL(sched_update_nr_running_tp); =20 -DEFINE_PER_CPU_SHARED_ALIGNED(struct rq, runqueues); +DEFINE_PER_CPU_SHARED_ALIGNED_ASI_NOT_SENSITIVE(struct rq, runqueues); =20 #ifdef CONFIG_SCHED_DEBUG /* @@ -5104,8 +5104,8 @@ void sched_exec(void) =20 #endif =20 -DEFINE_PER_CPU(struct kernel_stat, kstat); -DEFINE_PER_CPU(struct kernel_cpustat, kernel_cpustat); +DEFINE_PER_CPU_ASI_NOT_SENSITIVE(struct kernel_stat, kstat); +DEFINE_PER_CPU_ASI_NOT_SENSITIVE(struct kernel_cpustat, kernel_cpustat); =20 EXPORT_PER_CPU_SYMBOL(kstat); EXPORT_PER_CPU_SYMBOL(kernel_cpustat); diff --git a/kernel/sched/cpufreq.c b/kernel/sched/cpufreq.c index 7c2fe50fd76d..c55a47f8e963 100644 --- a/kernel/sched/cpufreq.c +++ b/kernel/sched/cpufreq.c @@ -9,7 +9,8 @@ =20 #include "sched.h" =20 -DEFINE_PER_CPU(struct update_util_data __rcu *, cpufreq_update_util_data); +DEFINE_PER_CPU_ASI_NOT_SENSITIVE(struct update_util_data __rcu *, + cpufreq_update_util_data); =20 /** * cpufreq_add_update_util_hook - Populate the CPU's update_util_data poin= ter. diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c index 623b5feb142a..d3ad13308889 100644 --- a/kernel/sched/cputime.c +++ b/kernel/sched/cputime.c @@ -17,7 +17,7 @@ * task when irq is in progress while we read rq->clock. That is a worthy * compromise in place of having locks on each irq in account_system_time. */ -DEFINE_PER_CPU(struct irqtime, cpu_irqtime); +DEFINE_PER_CPU_ASI_NOT_SENSITIVE(struct irqtime, cpu_irqtime); =20 static int __asi_not_sensitive sched_clock_irqtime; =20 diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 517c70a29a57..4188c1a570db 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -1360,7 +1360,7 @@ static inline void update_idle_core(struct rq *rq) static inline void update_idle_core(struct rq *rq) { } #endif =20 -DECLARE_PER_CPU_SHARED_ALIGNED(struct rq, runqueues); +DECLARE_PER_CPU_SHARED_ALIGNED_ASI_NOT_SENSITIVE(struct rq, runqueues); =20 #define cpu_rq(cpu) (&per_cpu(runqueues, (cpu))) #define this_rq() this_cpu_ptr(&runqueues) @@ -1760,13 +1760,13 @@ static inline struct sched_domain *lowest_flag_doma= in(int cpu, int flag) return sd; } =20 -DECLARE_PER_CPU(struct sched_domain __rcu *, sd_llc); -DECLARE_PER_CPU(int, sd_llc_size); -DECLARE_PER_CPU(int, sd_llc_id); -DECLARE_PER_CPU(struct sched_domain_shared __rcu *, sd_llc_shared); -DECLARE_PER_CPU(struct sched_domain __rcu *, sd_numa); -DECLARE_PER_CPU(struct sched_domain __rcu *, sd_asym_packing); -DECLARE_PER_CPU(struct sched_domain __rcu *, sd_asym_cpucapacity); +DECLARE_PER_CPU_ASI_NOT_SENSITIVE(struct sched_domain __rcu *, sd_llc); +DECLARE_PER_CPU_ASI_NOT_SENSITIVE(int, sd_llc_size); +DECLARE_PER_CPU_ASI_NOT_SENSITIVE(int, sd_llc_id); +DECLARE_PER_CPU_ASI_NOT_SENSITIVE(struct sched_domain_shared __rcu *, sd_l= lc_shared); +DECLARE_PER_CPU_ASI_NOT_SENSITIVE(struct sched_domain __rcu *, sd_numa); +DECLARE_PER_CPU_ASI_NOT_SENSITIVE(struct sched_domain __rcu *, sd_asym_pac= king); +DECLARE_PER_CPU_ASI_NOT_SENSITIVE(struct sched_domain __rcu *, sd_asym_cpu= capacity); extern struct static_key_false sched_asym_cpucapacity; =20 struct sched_group_capacity { @@ -2753,7 +2753,7 @@ struct irqtime { struct u64_stats_sync sync; }; =20 -DECLARE_PER_CPU(struct irqtime, cpu_irqtime); +DECLARE_PER_CPU_ASI_NOT_SENSITIVE(struct irqtime, cpu_irqtime); =20 /* * Returns the irqtime minus the softirq time computed by ksoftirqd. @@ -2776,7 +2776,8 @@ static inline u64 irq_time_read(int cpu) #endif /* CONFIG_IRQ_TIME_ACCOUNTING */ =20 #ifdef CONFIG_CPU_FREQ -DECLARE_PER_CPU(struct update_util_data __rcu *, cpufreq_update_util_data); +DECLARE_PER_CPU_ASI_NOT_SENSITIVE(struct update_util_data __rcu *, + cpufreq_update_util_data); =20 /** * cpufreq_update_util - Take a note about CPU utilization changes. diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c index d201a7052a29..1dcea6a6133e 100644 --- a/kernel/sched/topology.c +++ b/kernel/sched/topology.c @@ -641,13 +641,13 @@ static void destroy_sched_domains(struct sched_domain= *sd) * the cpumask of the domain), this allows us to quickly tell if * two CPUs are in the same cache domain, see cpus_share_cache(). */ -DEFINE_PER_CPU(struct sched_domain __rcu *, sd_llc); -DEFINE_PER_CPU(int, sd_llc_size); -DEFINE_PER_CPU(int, sd_llc_id); -DEFINE_PER_CPU(struct sched_domain_shared __rcu *, sd_llc_shared); -DEFINE_PER_CPU(struct sched_domain __rcu *, sd_numa); -DEFINE_PER_CPU(struct sched_domain __rcu *, sd_asym_packing); -DEFINE_PER_CPU(struct sched_domain __rcu *, sd_asym_cpucapacity); +DEFINE_PER_CPU_ASI_NOT_SENSITIVE(struct sched_domain __rcu *, sd_llc); +DEFINE_PER_CPU_ASI_NOT_SENSITIVE(int, sd_llc_size); +DEFINE_PER_CPU_ASI_NOT_SENSITIVE(int, sd_llc_id); +DEFINE_PER_CPU_ASI_NOT_SENSITIVE(struct sched_domain_shared __rcu *, sd_ll= c_shared); +DEFINE_PER_CPU_ASI_NOT_SENSITIVE(struct sched_domain __rcu *, sd_numa); +DEFINE_PER_CPU_ASI_NOT_SENSITIVE(struct sched_domain __rcu *, sd_asym_pack= ing); +DEFINE_PER_CPU_ASI_NOT_SENSITIVE(struct sched_domain __rcu *, sd_asym_cpuc= apacity); DEFINE_STATIC_KEY_FALSE(sched_asym_cpucapacity); =20 static void update_top_cache_domain(int cpu) diff --git a/kernel/smp.c b/kernel/smp.c index c51fd981a4a9..3c1b328f0a09 100644 --- a/kernel/smp.c +++ b/kernel/smp.c @@ -92,9 +92,10 @@ struct call_function_data { cpumask_var_t cpumask_ipi; }; =20 -static DEFINE_PER_CPU_ALIGNED(struct call_function_data, cfd_data); +static DEFINE_PER_CPU_ALIGNED_ASI_NOT_SENSITIVE(struct call_function_data,= cfd_data); =20 -static DEFINE_PER_CPU_SHARED_ALIGNED(struct llist_head, call_single_queue); +static DEFINE_PER_CPU_SHARED_ALIGNED_ASI_NOT_SENSITIVE(struct llist_head, + call_single_queue); =20 static void flush_smp_call_function_queue(bool warn_cpu_offline); =20 @@ -464,7 +465,7 @@ static __always_inline void csd_unlock(struct __call_si= ngle_data *csd) smp_store_release(&csd->node.u_flags, 0); } =20 -static DEFINE_PER_CPU_SHARED_ALIGNED(call_single_data_t, csd_data); +static DEFINE_PER_CPU_SHARED_ALIGNED_ASI_NOT_SENSITIVE(call_single_data_t,= csd_data); =20 void __smp_call_single_queue(int cpu, struct llist_node *node) { diff --git a/kernel/softirq.c b/kernel/softirq.c index c462b7fab4d3..d2660a59feab 100644 --- a/kernel/softirq.c +++ b/kernel/softirq.c @@ -59,7 +59,7 @@ EXPORT_PER_CPU_SYMBOL(irq_stat); static struct softirq_action softirq_vec[NR_SOFTIRQS] __asi_not_sensitive ____cacheline_aligned; =20 -DEFINE_PER_CPU(struct task_struct *, ksoftirqd); +DEFINE_PER_CPU_ASI_NOT_SENSITIVE(struct task_struct *, ksoftirqd); =20 const char * const softirq_to_name[NR_SOFTIRQS] =3D { "HI", "TIMER", "NET_TX", "NET_RX", "BLOCK", "IRQ_POLL", diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c index 8b176f5c01f2..74cfc89a17c4 100644 --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -65,7 +65,7 @@ * to reach a base using a clockid, hrtimer_clockid_to_base() * is used to convert from clockid to the proper hrtimer_base_type. */ -DEFINE_PER_CPU(struct hrtimer_cpu_base, hrtimer_bases) =3D +DEFINE_PER_CPU_ASI_NOT_SENSITIVE(struct hrtimer_cpu_base, hrtimer_bases) = =3D { .lock =3D __RAW_SPIN_LOCK_UNLOCKED(hrtimer_bases.lock), .clock_base =3D diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c index cbe75661ca74..67180cb44394 100644 --- a/kernel/time/tick-common.c +++ b/kernel/time/tick-common.c @@ -25,7 +25,7 @@ /* * Tick devices */ -DEFINE_PER_CPU(struct tick_device, tick_cpu_device); +DEFINE_PER_CPU_ASI_NOT_SENSITIVE(struct tick_device, tick_cpu_device); /* * Tick next event: keeps track of the tick time. It's updated by the * CPU which handles the tick and protected by jiffies_lock. There is diff --git a/kernel/time/tick-internal.h b/kernel/time/tick-internal.h index ed7e2a18060a..6961318d41b7 100644 --- a/kernel/time/tick-internal.h +++ b/kernel/time/tick-internal.h @@ -13,7 +13,7 @@ # define TICK_DO_TIMER_NONE -1 # define TICK_DO_TIMER_BOOT -2 =20 -DECLARE_PER_CPU(struct tick_device, tick_cpu_device); +DECLARE_PER_CPU_ASI_NOT_SENSITIVE(struct tick_device, tick_cpu_device); extern ktime_t tick_next_period; extern int tick_do_timer_cpu; =20 @@ -161,7 +161,7 @@ static inline void timers_update_nohz(void) { } #define tick_nohz_active (0) #endif =20 -DECLARE_PER_CPU(struct hrtimer_cpu_base, hrtimer_bases); +DECLARE_PER_CPU_ASI_NOT_SENSITIVE(struct hrtimer_cpu_base, hrtimer_bases); =20 extern u64 get_next_timer_interrupt(unsigned long basej, u64 basem); void timer_clear_idle(void); diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index c23fecbb68c2..afd393b85577 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -36,7 +36,7 @@ /* * Per-CPU nohz control structure */ -static DEFINE_PER_CPU(struct tick_sched, tick_cpu_sched); +static DEFINE_PER_CPU_ASI_NOT_SENSITIVE(struct tick_sched, tick_cpu_sched); =20 struct tick_sched *tick_get_tick_sched(int cpu) { diff --git a/kernel/time/timer.c b/kernel/time/timer.c index 0b09c99b568c..9567df187420 100644 --- a/kernel/time/timer.c +++ b/kernel/time/timer.c @@ -212,7 +212,7 @@ struct timer_base { struct hlist_head vectors[WHEEL_SIZE]; } ____cacheline_aligned; =20 -static DEFINE_PER_CPU(struct timer_base, timer_bases[NR_BASES]); +static DEFINE_PER_CPU_ASI_NOT_SENSITIVE(struct timer_base, timer_bases[NR_= BASES]); =20 #ifdef CONFIG_NO_HZ_COMMON =20 diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index eaec3814c5a4..b82f478caf4e 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -106,7 +106,7 @@ dummy_set_flag(struct trace_array *tr, u32 old_flags, u= 32 bit, int set) * tracing is active, only save the comm when a trace event * occurred. */ -static DEFINE_PER_CPU(bool, trace_taskinfo_save); +static DEFINE_PER_CPU_ASI_NOT_SENSITIVE(bool, trace_taskinfo_save); =20 /* * Kill all tracing for good (never come back). diff --git a/kernel/trace/trace_preemptirq.c b/kernel/trace/trace_preemptir= q.c index f4938040c228..177de3501677 100644 --- a/kernel/trace/trace_preemptirq.c +++ b/kernel/trace/trace_preemptirq.c @@ -17,7 +17,7 @@ =20 #ifdef CONFIG_TRACE_IRQFLAGS /* Per-cpu variable to prevent redundant calls when IRQs already off */ -static DEFINE_PER_CPU(int, tracing_irq_cpu); +static DEFINE_PER_CPU_ASI_NOT_SENSITIVE(int, tracing_irq_cpu); =20 /* * Like trace_hardirqs_on() but without the lockdep invocation. This is diff --git a/kernel/watchdog.c b/kernel/watchdog.c index ad912511a0c0..c2bf55024202 100644 --- a/kernel/watchdog.c +++ b/kernel/watchdog.c @@ -174,13 +174,13 @@ static bool softlockup_initialized __read_mostly; static u64 __read_mostly sample_period; =20 /* Timestamp taken after the last successful reschedule. */ -static DEFINE_PER_CPU(unsigned long, watchdog_touch_ts); +static DEFINE_PER_CPU_ASI_NOT_SENSITIVE(unsigned long, watchdog_touch_ts); /* Timestamp of the last softlockup report. */ -static DEFINE_PER_CPU(unsigned long, watchdog_report_ts); -static DEFINE_PER_CPU(struct hrtimer, watchdog_hrtimer); -static DEFINE_PER_CPU(bool, softlockup_touch_sync); -static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts); -static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts_saved); +static DEFINE_PER_CPU_ASI_NOT_SENSITIVE(unsigned long, watchdog_report_ts); +static DEFINE_PER_CPU_ASI_NOT_SENSITIVE(struct hrtimer, watchdog_hrtimer); +static DEFINE_PER_CPU_ASI_NOT_SENSITIVE(bool, softlockup_touch_sync); +static DEFINE_PER_CPU_ASI_NOT_SENSITIVE(unsigned long, hrtimer_interrupts); +static DEFINE_PER_CPU_ASI_NOT_SENSITIVE(unsigned long, hrtimer_interrupts_= saved); static unsigned long soft_lockup_nmi_warn; =20 static int __init nowatchdog_setup(char *str) diff --git a/lib/irq_regs.c b/lib/irq_regs.c index 0d545a93070e..8b3c6be06a7a 100644 --- a/lib/irq_regs.c +++ b/lib/irq_regs.c @@ -9,6 +9,6 @@ #include =20 #ifndef ARCH_HAS_OWN_IRQ_REGS -DEFINE_PER_CPU(struct pt_regs *, __irq_regs); +DEFINE_PER_CPU_ASI_NOT_SENSITIVE(struct pt_regs *, __irq_regs); EXPORT_PER_CPU_SYMBOL(__irq_regs); #endif diff --git a/lib/random32.c b/lib/random32.c index a57a0e18819d..e4c1cb1a70b4 100644 --- a/lib/random32.c +++ b/lib/random32.c @@ -339,7 +339,8 @@ struct siprand_state { }; =20 static DEFINE_PER_CPU(struct siprand_state, net_rand_state) __latent_entro= py; -DEFINE_PER_CPU(unsigned long, net_rand_noise); +/* TODO(oweisse): Is this entropy sensitive?? */ +DEFINE_PER_CPU_ASI_NOT_SENSITIVE(unsigned long, net_rand_noise); EXPORT_PER_CPU_SYMBOL(net_rand_noise); =20 /* diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 0af973b950c2..8d2d76de5bd0 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -110,7 +110,7 @@ static atomic_t hardware_enable_failed; static struct kmem_cache *kvm_vcpu_cache; =20 static __read_mostly struct preempt_ops kvm_preempt_ops; -static DEFINE_PER_CPU(struct kvm_vcpu *, kvm_running_vcpu); +static DEFINE_PER_CPU_ASI_NOT_SENSITIVE(struct kvm_vcpu *, kvm_running_vcp= u); =20 struct dentry *kvm_debugfs_dir; EXPORT_SYMBOL_GPL(kvm_debugfs_dir); --=20 2.35.1.473.g83b2b277ed-goog From nobody Fri Jun 26 13:44:01 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 44DE4C433EF for ; Wed, 23 Feb 2022 05:28:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235775AbiBWF3R (ORCPT ); Wed, 23 Feb 2022 00:29:17 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57938 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238459AbiBWF1T (ORCPT ); Wed, 23 Feb 2022 00:27:19 -0500 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 049DD6E8F1 for ; Tue, 22 Feb 2022 21:25:31 -0800 (PST) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-2d61f6c1877so163280917b3.15 for ; Tue, 22 Feb 2022 21:25:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=1Ak925Pk776XAKxLwtqmq+2ujMHKkPdWGXyWe6mB81I=; b=qruJowqTdMVz7fZkxeG4xVb20iJ53LrE4dT1CKYoVmqpJaSFmZqssZFPvLDlocajG+ 0ZNfohsXrp0SvmtgoKo+3GpbCl6Ek66LSSQDb5QhwDTmpTY3IZR68hTWLTakDK649FWd NcBzCkwDsR687gBw9hUAZO42EBgmkkjWUtK9c7iThbB5OJvWqulI+ZjA4At9vc3jDdQv V3RzFrXEtxS3BFQakXKK5JYagS7g37QYbi+N7FduEDd6mJWAl6cR1xbe48z+M34xGgJb +aIJO9WdvPWGVowL+cjeTIFBzTQnJ9oyaKZdWhvwquAx5l+fbfWtWxe31CFQ765daVDd WYtQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=1Ak925Pk776XAKxLwtqmq+2ujMHKkPdWGXyWe6mB81I=; b=BmcRkeo8byto6JvwNbGcEfoCfGQZu8IPr+L4J+OKIMFEeI/t0mUCXhvNM5k8QOuIVD 4nAPGnWWYRxriAkTYxAUYrP0yvbWzY1MXy/BlRQwgGFbbOFYzxzNI0NlonEeq1sR0eX9 hKa0rqFMqOeWNXZdCnOMJJatJ/3L7NTB8g7DOeiE0oGxn5iE9DR1JUWO3deNAtT2cBNP 8RGWswg22vJMWNCFB2HjNNOo9Qi0VpG4SbK3HaU4cj5aOKNSSn1vZjXiEwqvjuRDEehB Sp1bFC/MQIdKe2FRFqJm6o0AUvA+0DGqTtcGiQ1QDmRd9lZgv9ybSQukkZop/p+l1vVa /jMw== X-Gm-Message-State: AOAM533c9Y1T/urVE5oD7GylglVWh2uQMQ6iqYszf02p6QmN8QBS1PAT hRjzYuM6xb91wi/PQ7C15E3aeQfTrbtg4WyPxLWTOZVcoqyEryJwWHSN1cX3qwoUpmQAmpNAlMr N9P708LP9nE1bN3VwCxOyGuamUgoUVWDr+KkXgEzpBPRFMMuJEtcdjtObo1bux5I14dlGH4Tb X-Google-Smtp-Source: ABdhPJz+8yu7dLdPv2zaC0fb4RR9so3iE3HoEaEWayOmFqlvYUtwhO3rWaAyrQMIf9/n3UzRyvpRD4FVMxcA X-Received: from js-desktop.svl.corp.google.com ([2620:15c:2cd:202:ccbe:5d15:e2e6:322]) (user=junaids job=sendgmr) by 2002:a25:bad2:0:b0:620:fe28:ff53 with SMTP id a18-20020a25bad2000000b00620fe28ff53mr26733639ybk.340.1645593921610; Tue, 22 Feb 2022 21:25:21 -0800 (PST) Date: Tue, 22 Feb 2022 21:22:19 -0800 In-Reply-To: <20220223052223.1202152-1-junaids@google.com> Message-Id: <20220223052223.1202152-44-junaids@google.com> Mime-Version: 1.0 References: <20220223052223.1202152-1-junaids@google.com> X-Mailer: git-send-email 2.35.1.473.g83b2b277ed-goog Subject: [RFC PATCH 43/47] mm: asi: Annotation of dynamic variables to be nonsensitive From: Junaid Shahid To: linux-kernel@vger.kernel.org Cc: Ofir Weisse , kvm@vger.kernel.org, pbonzini@redhat.com, jmattson@google.com, pjt@google.com, alexandre.chartre@oracle.com, rppt@linux.ibm.com, dave.hansen@linux.intel.com, peterz@infradead.org, tglx@linutronix.de, luto@kernel.org, linux-mm@kvack.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Ofir Weisse The heart of ASI is to diffrentiate between sensitive and non-sensitive data access. This commit marks certain dynamic allocations as not sensitive. Some dynamic variables are accessed frequently and therefore would cause many ASI exits. The frequency of these accesses is monitored by tracing asi_exits and analyzing the accessed addresses. Many of these variables don't contain sensitive information and can therefore be mapped into the global ASI region. This commit adds GFP_LOCAL/GLOBAL_NONSENSITIVE attributes to these frequenmtly-accessed yet not sensitive variables. The end result is a very significant reduction in ASI exits on real benchmarks. Signed-off-by: Ofir Weisse --- arch/x86/include/asm/kvm_host.h | 3 ++- arch/x86/kernel/apic/x2apic_cluster.c | 2 +- arch/x86/kvm/cpuid.c | 4 ++- arch/x86/kvm/lapic.c | 9 ++++--- arch/x86/kvm/mmu/mmu.c | 7 ++++++ arch/x86/kvm/vmx/vmx.c | 6 +++-- arch/x86/kvm/x86.c | 8 +++--- fs/binfmt_elf.c | 2 +- fs/eventfd.c | 2 +- fs/eventpoll.c | 10 +++++--- fs/exec.c | 2 ++ fs/file.c | 3 ++- fs/timerfd.c | 2 +- include/linux/kvm_host.h | 2 +- include/linux/kvm_types.h | 3 +++ kernel/cgroup/cgroup.c | 4 +-- kernel/events/core.c | 15 +++++++---- kernel/exit.c | 2 ++ kernel/fork.c | 36 +++++++++++++++++++++------ kernel/rcu/srcutree.c | 3 ++- kernel/sched/core.c | 6 +++-- kernel/sched/cpuacct.c | 8 +++--- kernel/sched/fair.c | 3 ++- kernel/sched/topology.c | 14 +++++++---- kernel/smp.c | 17 +++++++------ kernel/trace/ring_buffer.c | 5 ++-- kernel/tracepoint.c | 2 +- lib/radix-tree.c | 6 ++--- mm/memcontrol.c | 7 +++--- mm/util.c | 3 ++- mm/vmalloc.c | 3 ++- net/core/skbuff.c | 2 +- net/core/sock.c | 2 +- virt/kvm/coalesced_mmio.c | 2 +- virt/kvm/eventfd.c | 5 ++-- virt/kvm/kvm_main.c | 12 ++++++--- 36 files changed, 148 insertions(+), 74 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index b7292c4fece7..34a05add5e77 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1562,7 +1562,8 @@ static inline void kvm_ops_static_call_update(void) #define __KVM_HAVE_ARCH_VM_ALLOC static inline struct kvm *kvm_arch_alloc_vm(void) { - return __vmalloc(kvm_x86_ops.vm_size, GFP_KERNEL_ACCOUNT | __GFP_ZERO); + return __vmalloc(kvm_x86_ops.vm_size, GFP_KERNEL_ACCOUNT | __GFP_ZERO | + __GFP_GLOBAL_NONSENSITIVE); } =20 #define __KVM_HAVE_ARCH_VM_FREE diff --git a/arch/x86/kernel/apic/x2apic_cluster.c b/arch/x86/kernel/apic/x= 2apic_cluster.c index 655fe820a240..a1f6eb51ecb7 100644 --- a/arch/x86/kernel/apic/x2apic_cluster.c +++ b/arch/x86/kernel/apic/x2apic_cluster.c @@ -144,7 +144,7 @@ static int alloc_clustermask(unsigned int cpu, int node) } =20 cluster_hotplug_mask =3D kzalloc_node(sizeof(*cluster_hotplug_mask), - GFP_KERNEL, node); + GFP_KERNEL | __GFP_GLOBAL_NONSENSITIVE, node); if (!cluster_hotplug_mask) return -ENOMEM; cluster_hotplug_mask->node =3D node; diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 07e9215e911d..dedabfdd292e 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -310,7 +310,9 @@ int kvm_vcpu_ioctl_set_cpuid(struct kvm_vcpu *vcpu, if (IS_ERR(e)) return PTR_ERR(e); =20 - e2 =3D kvmalloc_array(cpuid->nent, sizeof(*e2), GFP_KERNEL_ACCOUNT); + e2 =3D kvmalloc_array(cpuid->nent, sizeof(*e2), + GFP_KERNEL_ACCOUNT | + __GFP_LOCAL_NONSENSITIVE); if (!e2) { r =3D -ENOMEM; goto out_free_cpuid; diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 213bbdfab49e..3a550299f015 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -213,7 +213,7 @@ void kvm_recalculate_apic_map(struct kvm *kvm) =20 new =3D kvzalloc(sizeof(struct kvm_apic_map) + sizeof(struct kvm_lapic *) * ((u64)max_id + 1), - GFP_KERNEL_ACCOUNT); + GFP_KERNEL_ACCOUNT | __GFP_LOCAL_NONSENSITIVE); =20 if (!new) goto out; @@ -993,7 +993,7 @@ bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, str= uct kvm_lapic *src, *r =3D -1; =20 if (irq->shorthand =3D=3D APIC_DEST_SELF) { - *r =3D kvm_apic_set_irq(src->vcpu, irq, dest_map); + *r =3D kvm_apic_set_irq(src->vcpu, irq, dest_map); return true; } =20 @@ -2455,13 +2455,14 @@ int kvm_create_lapic(struct kvm_vcpu *vcpu, int tim= er_advance_ns) =20 ASSERT(vcpu !=3D NULL); =20 - apic =3D kzalloc(sizeof(*apic), GFP_KERNEL_ACCOUNT); + apic =3D kzalloc(sizeof(*apic), GFP_KERNEL_ACCOUNT | __GFP_LOCAL_NONSENSI= TIVE); if (!apic) goto nomem; =20 vcpu->arch.apic =3D apic; =20 - apic->regs =3D (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT); + apic->regs =3D (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT + | __GFP_LOCAL_NONSENSITIVE); if (!apic->regs) { printk(KERN_ERR "malloc apic regs error for vcpu %x\n", vcpu->vcpu_id); diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 5785a0d02558..a2ada1104c2d 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -5630,6 +5630,13 @@ int kvm_mmu_create(struct kvm_vcpu *vcpu) vcpu->arch.mmu_page_header_cache.gfp_zero =3D __GFP_ZERO; =20 vcpu->arch.mmu_shadow_page_cache.gfp_zero =3D __GFP_ZERO; +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + if (static_cpu_has(X86_FEATURE_ASI) && mm_asi_enabled(current->mm)) + vcpu->arch.mmu_shadow_page_cache.gfp_asi =3D + __GFP_LOCAL_NONSENSI= TIVE; + else + vcpu->arch.mmu_shadow_page_cache.gfp_asi =3D 0; +#endif =20 vcpu->arch.mmu =3D &vcpu->arch.root_mmu; vcpu->arch.walk_mmu =3D &vcpu->arch.root_mmu; diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index e1ad82c25a78..6e1bb017b696 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -2629,7 +2629,7 @@ void free_loaded_vmcs(struct loaded_vmcs *loaded_vmcs) free_vmcs(loaded_vmcs->vmcs); loaded_vmcs->vmcs =3D NULL; if (loaded_vmcs->msr_bitmap) - free_page((unsigned long)loaded_vmcs->msr_bitmap); + kfree(loaded_vmcs->msr_bitmap); WARN_ON(loaded_vmcs->shadow_vmcs !=3D NULL); } =20 @@ -2648,7 +2648,9 @@ int alloc_loaded_vmcs(struct loaded_vmcs *loaded_vmcs) =20 if (cpu_has_vmx_msr_bitmap()) { loaded_vmcs->msr_bitmap =3D (unsigned long *) - __get_free_page(GFP_KERNEL_ACCOUNT); + kzalloc(PAGE_SIZE, + GFP_KERNEL_ACCOUNT | + __GFP_LOCAL_NONSENSITIVE ); if (!loaded_vmcs->msr_bitmap) goto out_vmcs; memset(loaded_vmcs->msr_bitmap, 0xff, PAGE_SIZE); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 451872d178e5..dd862edc1b5a 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -329,7 +329,8 @@ static struct kmem_cache *kvm_alloc_emulator_cache(void) =20 return kmem_cache_create_usercopy("x86_emulator", size, __alignof__(struct x86_emulate_ctxt), - SLAB_ACCOUNT, useroffset, + SLAB_ACCOUNT|SLAB_LOCAL_NONSENSITIVE, + useroffset, size - useroffset, NULL); } =20 @@ -10969,7 +10970,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu) =20 r =3D -ENOMEM; =20 - page =3D alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO); + page =3D alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO | __GFP_LOCAL_NONSENS= ITIVE); if (!page) goto fail_free_lapic; vcpu->arch.pio_data =3D page_address(page); @@ -11718,7 +11719,8 @@ static int kvm_alloc_memslot_metadata(struct kvm *k= vm, =20 lpages =3D __kvm_mmu_slot_lpages(slot, npages, level); =20 - linfo =3D kvcalloc(lpages, sizeof(*linfo), GFP_KERNEL_ACCOUNT); + linfo =3D kvcalloc(lpages, sizeof(*linfo), + GFP_KERNEL_ACCOUNT | __GFP_LOCAL_NONSENSITIVE); if (!linfo) goto out_free; =20 diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c index f8c7f26f1fbb..b0550951da59 100644 --- a/fs/binfmt_elf.c +++ b/fs/binfmt_elf.c @@ -477,7 +477,7 @@ static struct elf_phdr *load_elf_phdrs(const struct elf= hdr *elf_ex, if (size =3D=3D 0 || size > 65536 || size > ELF_MIN_ALIGN) goto out; =20 - elf_phdata =3D kmalloc(size, GFP_KERNEL); + elf_phdata =3D kmalloc(size, GFP_KERNEL | __GFP_GLOBAL_NONSENSITIVE); if (!elf_phdata) goto out; =20 diff --git a/fs/eventfd.c b/fs/eventfd.c index 3627dd7d25db..c748433e52af 100644 --- a/fs/eventfd.c +++ b/fs/eventfd.c @@ -415,7 +415,7 @@ static int do_eventfd(unsigned int count, int flags) if (flags & ~EFD_FLAGS_SET) return -EINVAL; =20 - ctx =3D kmalloc(sizeof(*ctx), GFP_KERNEL); + ctx =3D kmalloc(sizeof(*ctx), GFP_KERNEL | __GFP_GLOBAL_NONSENSITIVE); if (!ctx) return -ENOMEM; =20 diff --git a/fs/eventpoll.c b/fs/eventpoll.c index 06f4c5ae1451..b28826c9f079 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -1239,7 +1239,7 @@ static void ep_ptable_queue_proc(struct file *file, w= ait_queue_head_t *whead, if (unlikely(!epi)) // an earlier allocation has failed return; =20 - pwq =3D kmem_cache_alloc(pwq_cache, GFP_KERNEL); + pwq =3D kmem_cache_alloc(pwq_cache, GFP_KERNEL | __GFP_GLOBAL_NONSENSITIV= E); if (unlikely(!pwq)) { epq->epi =3D NULL; return; @@ -1453,7 +1453,8 @@ static int ep_insert(struct eventpoll *ep, const stru= ct epoll_event *event, return -ENOSPC; percpu_counter_inc(&ep->user->epoll_watches); =20 - if (!(epi =3D kmem_cache_zalloc(epi_cache, GFP_KERNEL))) { + if (!(epi =3D kmem_cache_zalloc(epi_cache, + GFP_KERNEL | __GFP_GLOBAL_NONSENSITI= VE))) { percpu_counter_dec(&ep->user->epoll_watches); return -ENOMEM; } @@ -2373,11 +2374,12 @@ static int __init eventpoll_init(void) =20 /* Allocates slab cache used to allocate "struct epitem" items */ epi_cache =3D kmem_cache_create("eventpoll_epi", sizeof(struct epitem), - 0, SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_ACCOUNT, NULL); + 0, SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_ACCOUNT|SLAB_GLOBAL_NONSENSITIVE,= NULL); =20 /* Allocates slab cache used to allocate "struct eppoll_entry" */ pwq_cache =3D kmem_cache_create("eventpoll_pwq", - sizeof(struct eppoll_entry), 0, SLAB_PANIC|SLAB_ACCOUNT, NULL); + sizeof(struct eppoll_entry), 0, + SLAB_PANIC|SLAB_ACCOUNT|SLAB_GLOBAL_NONSENSITIVE, NULL); =20 ephead_cache =3D kmem_cache_create("ep_head", sizeof(struct epitems_head), 0, SLAB_PANIC|SLAB_ACCOUNT, NULL); diff --git a/fs/exec.c b/fs/exec.c index 537d92c41105..76f3b433e80d 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1238,6 +1238,8 @@ int begin_new_exec(struct linux_binprm * bprm) struct task_struct *me =3D current; int retval; =20 + /* TODO: (oweisse) unmap the stack from ASI */ + /* Once we are committed compute the creds */ retval =3D bprm_creds_from_file(bprm); if (retval) diff --git a/fs/file.c b/fs/file.c index 97d212a9b814..85bfa5d70323 100644 --- a/fs/file.c +++ b/fs/file.c @@ -117,7 +117,8 @@ static struct fdtable * alloc_fdtable(unsigned int nr) if (!fdt) goto out; fdt->max_fds =3D nr; - data =3D kvmalloc_array(nr, sizeof(struct file *), GFP_KERNEL_ACCOUNT); + data =3D kvmalloc_array(nr, sizeof(struct file *), + GFP_KERNEL_ACCOUNT | __GFP_LOCAL_NONSENSITIV= E); if (!data) goto out_fdt; fdt->fd =3D data; diff --git a/fs/timerfd.c b/fs/timerfd.c index e9c96a0c79f1..385fbb29837d 100644 --- a/fs/timerfd.c +++ b/fs/timerfd.c @@ -425,7 +425,7 @@ SYSCALL_DEFINE2(timerfd_create, int, clockid, int, flag= s) !capable(CAP_WAKE_ALARM)) return -EPERM; =20 - ctx =3D kzalloc(sizeof(*ctx), GFP_KERNEL); + ctx =3D kzalloc(sizeof(*ctx), GFP_KERNEL | __GFP_GLOBAL_NONSENSITIVE); if (!ctx) return -ENOMEM; =20 diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index f31f7442eced..dfbb26d7a185 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -1085,7 +1085,7 @@ int kvm_arch_create_vm_debugfs(struct kvm *kvm); */ static inline struct kvm *kvm_arch_alloc_vm(void) { - return kzalloc(sizeof(struct kvm), GFP_KERNEL); + return kzalloc(sizeof(struct kvm), GFP_KERNEL | __GFP_LOCAL_NONSENSITIVE); } #endif =20 diff --git a/include/linux/kvm_types.h b/include/linux/kvm_types.h index 234eab059839..a5a810db85ca 100644 --- a/include/linux/kvm_types.h +++ b/include/linux/kvm_types.h @@ -64,6 +64,9 @@ struct gfn_to_hva_cache { struct kvm_mmu_memory_cache { int nobjs; gfp_t gfp_zero; +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + gfp_t gfp_asi; +#endif struct kmem_cache *kmem_cache; void *objects[KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE]; }; diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c index 729495e17363..79692dafd2be 100644 --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -1221,7 +1221,7 @@ static struct css_set *find_css_set(struct css_set *o= ld_cset, if (cset) return cset; =20 - cset =3D kzalloc(sizeof(*cset), GFP_KERNEL); + cset =3D kzalloc(sizeof(*cset), GFP_KERNEL | __GFP_GLOBAL_NONSENSITIVE); if (!cset) return NULL; =20 @@ -5348,7 +5348,7 @@ static struct cgroup *cgroup_create(struct cgroup *pa= rent, const char *name, =20 /* allocate the cgroup and its ID, 0 is reserved for the root */ cgrp =3D kzalloc(struct_size(cgrp, ancestor_ids, (level + 1)), - GFP_KERNEL); + GFP_KERNEL | __GFP_GLOBAL_NONSENSITIVE); if (!cgrp) return ERR_PTR(-ENOMEM); =20 diff --git a/kernel/events/core.c b/kernel/events/core.c index 1914cc538cab..64eeb2c67d92 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -4586,7 +4586,8 @@ alloc_perf_context(struct pmu *pmu, struct task_struc= t *task) { struct perf_event_context *ctx; =20 - ctx =3D kzalloc(sizeof(struct perf_event_context), GFP_KERNEL); + ctx =3D kzalloc(sizeof(struct perf_event_context), + GFP_KERNEL | __GFP_GLOBAL_NONSENSITIVE); if (!ctx) return NULL; =20 @@ -11062,7 +11063,8 @@ int perf_pmu_register(struct pmu *pmu, const char *= name, int type) =20 mutex_lock(&pmus_lock); ret =3D -ENOMEM; - pmu->pmu_disable_count =3D alloc_percpu(int); + pmu->pmu_disable_count =3D alloc_percpu_gfp(int, + GFP_KERNEL | __GFP_GLOBAL_NONSENSI= TIVE); if (!pmu->pmu_disable_count) goto unlock; =20 @@ -11112,7 +11114,8 @@ int perf_pmu_register(struct pmu *pmu, const char *= name, int type) goto got_cpu_context; =20 ret =3D -ENOMEM; - pmu->pmu_cpu_context =3D alloc_percpu(struct perf_cpu_context); + pmu->pmu_cpu_context =3D alloc_percpu_gfp(struct perf_cpu_context, + GFP_KERNEL | __GFP_GLOBAL_= NONSENSITIVE); if (!pmu->pmu_cpu_context) goto free_dev; =20 @@ -11493,7 +11496,8 @@ perf_event_alloc(struct perf_event_attr *attr, int = cpu, } =20 node =3D (cpu >=3D 0) ? cpu_to_node(cpu) : -1; - event =3D kmem_cache_alloc_node(perf_event_cache, GFP_KERNEL | __GFP_ZERO, + event =3D kmem_cache_alloc_node(perf_event_cache, + GFP_KERNEL | __GFP_ZERO | __GFP_GLOB= AL_NONSENSITIVE, node); if (!event) return ERR_PTR(-ENOMEM); @@ -13378,7 +13382,8 @@ void __init perf_event_init(void) ret =3D init_hw_breakpoint(); WARN(ret, "hw_breakpoint initialization failed with: %d", ret); =20 - perf_event_cache =3D KMEM_CACHE(perf_event, SLAB_PANIC); + perf_event_cache =3D KMEM_CACHE(perf_event, + SLAB_PANIC | SLAB_GLOBAL_NONSENSITIV= E); =20 /* * Build time assertion that we keep the data_head at the intended diff --git a/kernel/exit.c b/kernel/exit.c index f702a6a63686..ab2749cf6887 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -768,6 +768,8 @@ void __noreturn do_exit(long code) profile_task_exit(tsk); kcov_task_exit(tsk); =20 + /* TODO: (oweisse) unmap the stack from ASI */ + coredump_task_exit(tsk); ptrace_event(PTRACE_EVENT_EXIT, code); =20 diff --git a/kernel/fork.c b/kernel/fork.c index d7f55de00947..cb147a72372d 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -168,6 +168,8 @@ static struct kmem_cache *task_struct_cachep; =20 static inline struct task_struct *alloc_task_struct_node(int node) { + /* TODO: Figure how to allocate this propperly to ASI process map.= This + * should be mapped in a __GFP_LOCAL_NONSENSITIVE slab. */ return kmem_cache_alloc_node(task_struct_cachep, GFP_KERNEL, node); } =20 @@ -214,6 +216,7 @@ static int free_vm_stack_cache(unsigned int cpu) =20 static unsigned long *alloc_thread_stack_node(struct task_struct *tsk, int= node) { + /* TODO: (oweisse) Add annotation to map the stack into ASI */ #ifdef CONFIG_VMAP_STACK void *stack; int i; @@ -242,9 +245,13 @@ static unsigned long *alloc_thread_stack_node(struct t= ask_struct *tsk, int node) * so memcg accounting is performed manually on assigning/releasing * stacks to tasks. Drop __GFP_ACCOUNT. */ + /* ASI: We intentionally don't pass VM_LOCAL_NONSENSITIVE nor + * __GFP_LOCAL_NONSENSITIVE since we don't have an mm yet. Later o= n we'll + * map the stack into the mm asi map. That being said, we do care = about + * the stack weing allocaed below VMALLOC_LOCAL_NONSENSITIVE_END */ stack =3D __vmalloc_node_range(THREAD_SIZE, THREAD_ALIGN, - VMALLOC_START, VMALLOC_END, - THREADINFO_GFP & ~__GFP_ACCOUNT, + VMALLOC_START, VMALLOC_LOCAL_NONSENSITIVE_END, + (THREADINFO_GFP & (~__GFP_ACCOUNT)), PAGE_KERNEL, 0, node, __builtin_return_address(0)); =20 @@ -346,7 +353,8 @@ struct vm_area_struct *vm_area_alloc(struct mm_struct *= mm) { struct vm_area_struct *vma; =20 - vma =3D kmem_cache_alloc(vm_area_cachep, GFP_KERNEL); + vma =3D kmem_cache_alloc(vm_area_cachep, + GFP_KERNEL); if (vma) vma_init(vma, mm); return vma; @@ -683,6 +691,8 @@ static void check_mm(struct mm_struct *mm) #endif } =20 +/* TODO: (oweisse) ASI: we need to allocate mm such that it will only be v= isible + * within itself. */ #define allocate_mm() (kmem_cache_alloc(mm_cachep, GFP_KERNEL)) #define free_mm(mm) (kmem_cache_free(mm_cachep, (mm))) =20 @@ -823,9 +833,12 @@ void __init fork_init(void) =20 /* create a slab on which task_structs can be allocated */ task_struct_whitelist(&useroffset, &usersize); + /* TODO: (oweisse) for the time being this cache is shared among a= ll tasks. We + * mark it SLAB_NONSENSITIVE so task_struct can be accessed within= g ASI. + * A final secure solution should have this memory LOCAL, not GLOB= AL.*/ task_struct_cachep =3D kmem_cache_create_usercopy("task_struct", arch_task_struct_size, align, - SLAB_PANIC|SLAB_ACCOUNT, + SLAB_PANIC|SLAB_ACCOUNT|SLAB_GLOBAL_NONSENSITIVE, useroffset, usersize, NULL); #endif =20 @@ -1601,6 +1614,7 @@ static int copy_sighand(unsigned long clone_flags, st= ruct task_struct *tsk) refcount_inc(¤t->sighand->count); return 0; } + /* TODO: (oweisse) ASI replace with proper ASI allcation. */ sig =3D kmem_cache_alloc(sighand_cachep, GFP_KERNEL); RCU_INIT_POINTER(tsk->sighand, sig); if (!sig) @@ -1649,6 +1663,8 @@ static int copy_signal(unsigned long clone_flags, str= uct task_struct *tsk) if (clone_flags & CLONE_THREAD) return 0; =20 + /* TODO: (oweisse) figure out how to properly allocate this in ASI= for local + * process */ sig =3D kmem_cache_zalloc(signal_cachep, GFP_KERNEL); tsk->signal =3D sig; if (!sig) @@ -2923,7 +2939,8 @@ void __init proc_caches_init(void) SLAB_ACCOUNT, sighand_ctor); signal_cachep =3D kmem_cache_create("signal_cache", sizeof(struct signal_struct), 0, - SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_ACCOUNT, + SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_ACCOUNT| + SLAB_GLOBAL_NONSENSITIVE, NULL); files_cachep =3D kmem_cache_create("files_cache", sizeof(struct files_struct), 0, @@ -2941,13 +2958,18 @@ void __init proc_caches_init(void) */ mm_size =3D sizeof(struct mm_struct) + cpumask_size(); =20 + /* TODO: (oweisse) ASI replace with proper ASI allcation. */ mm_cachep =3D kmem_cache_create_usercopy("mm_struct", mm_size, ARCH_MIN_MMSTRUCT_ALIGN, - SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_ACCOUNT, + SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_ACCOUNT + |SLAB_GLOBAL_NONSENSITIVE, offsetof(struct mm_struct, saved_auxv), sizeof_field(struct mm_struct, saved_auxv), NULL); - vm_area_cachep =3D KMEM_CACHE(vm_area_struct, SLAB_PANIC|SLAB_ACCOUNT); + + /* TODO: (oweisse) ASI replace with proper ASI allcation. */ + vm_area_cachep =3D KMEM_CACHE(vm_area_struct, + SLAB_PANIC|SLAB_ACCOUNT|SLAB_LOCAL_NON= SENSITIVE); mmap_init(); nsproxy_cache_init(); } diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c index 6833d8887181..553221503803 100644 --- a/kernel/rcu/srcutree.c +++ b/kernel/rcu/srcutree.c @@ -171,7 +171,8 @@ static int init_srcu_struct_fields(struct srcu_struct *= ssp, bool is_static) atomic_set(&ssp->srcu_barrier_cpu_cnt, 0); INIT_DELAYED_WORK(&ssp->work, process_srcu); if (!is_static) - ssp->sda =3D alloc_percpu(struct srcu_data); + ssp->sda =3D alloc_percpu_gfp(struct srcu_data, + GFP_KERNEL | __GFP_GLOBAL_NONSENSITIVE); if (!ssp->sda) return -ENOMEM; init_srcu_struct_nodes(ssp); diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 7c96f0001c7f..7515f0612f5c 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -9329,7 +9329,8 @@ void __init sched_init(void) #endif /* CONFIG_RT_GROUP_SCHED */ =20 #ifdef CONFIG_CGROUP_SCHED - task_group_cache =3D KMEM_CACHE(task_group, 0); + /* TODO: (oweisse) add SLAB_NONSENSITIVE */ + task_group_cache =3D KMEM_CACHE(task_group, SLAB_GLOBAL_NONSENSITIVE); =20 list_add(&root_task_group.list, &task_groups); INIT_LIST_HEAD(&root_task_group.children); @@ -9741,7 +9742,8 @@ struct task_group *sched_create_group(struct task_gro= up *parent) { struct task_group *tg; =20 - tg =3D kmem_cache_alloc(task_group_cache, GFP_KERNEL | __GFP_ZERO); + tg =3D kmem_cache_alloc(task_group_cache, + GFP_KERNEL | __GFP_ZERO | __GFP_GLOBAL_NONSE= NSITIVE); if (!tg) return ERR_PTR(-ENOMEM); =20 diff --git a/kernel/sched/cpuacct.c b/kernel/sched/cpuacct.c index 6e3da149125c..e8b0b29b4d37 100644 --- a/kernel/sched/cpuacct.c +++ b/kernel/sched/cpuacct.c @@ -64,15 +64,17 @@ cpuacct_css_alloc(struct cgroup_subsys_state *parent_cs= s) if (!parent_css) return &root_cpuacct.css; =20 - ca =3D kzalloc(sizeof(*ca), GFP_KERNEL); + ca =3D kzalloc(sizeof(*ca), GFP_KERNEL | __GFP_GLOBAL_NONSENSITIVE); if (!ca) goto out; =20 - ca->cpuusage =3D alloc_percpu(struct cpuacct_usage); + ca->cpuusage =3D alloc_percpu_gfp(struct cpuacct_usage, + GFP_KERNEL | __GFP_GLOBAL_NONSENSI= TIVE); if (!ca->cpuusage) goto out_free_ca; =20 - ca->cpustat =3D alloc_percpu(struct kernel_cpustat); + ca->cpustat =3D alloc_percpu_gfp(struct kernel_cpustat, + GFP_KERNEL | __GFP_GLOBAL_NONSENSITIVE); if (!ca->cpustat) goto out_free_cpuusage; =20 diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index dc9b6133b059..97d70f1eb2c5 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -11486,7 +11486,8 @@ int alloc_fair_sched_group(struct task_group *tg, s= truct task_group *parent) =20 for_each_possible_cpu(i) { cfs_rq =3D kzalloc_node(sizeof(struct cfs_rq), - GFP_KERNEL, cpu_to_node(i)); + GFP_KERNEL | __GFP_GLOBAL_NONSENSITIVE, + cpu_to_node(i)); if (!cfs_rq) goto err; =20 diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c index 1dcea6a6133e..2ad96c78306c 100644 --- a/kernel/sched/topology.c +++ b/kernel/sched/topology.c @@ -569,7 +569,7 @@ static struct root_domain *alloc_rootdomain(void) { struct root_domain *rd; =20 - rd =3D kzalloc(sizeof(*rd), GFP_KERNEL); + rd =3D kzalloc(sizeof(*rd), GFP_KERNEL | __GFP_GLOBAL_NONSENSITIVE); if (!rd) return NULL; =20 @@ -2044,21 +2044,24 @@ static int __sdt_alloc(const struct cpumask *cpu_ma= p) struct sched_group_capacity *sgc; =20 sd =3D kzalloc_node(sizeof(struct sched_domain) + cpumask_size(), - GFP_KERNEL, cpu_to_node(j)); + GFP_KERNEL | __GFP_GLOBAL_NONSENSITIVE, + cpu_to_node(j)); if (!sd) return -ENOMEM; =20 *per_cpu_ptr(sdd->sd, j) =3D sd; =20 sds =3D kzalloc_node(sizeof(struct sched_domain_shared), - GFP_KERNEL, cpu_to_node(j)); + GFP_KERNEL | __GFP_GLOBAL_NONSENSITIVE, + cpu_to_node(j)); if (!sds) return -ENOMEM; =20 *per_cpu_ptr(sdd->sds, j) =3D sds; =20 sg =3D kzalloc_node(sizeof(struct sched_group) + cpumask_size(), - GFP_KERNEL, cpu_to_node(j)); + GFP_KERNEL | __GFP_GLOBAL_NONSENSITIVE, + cpu_to_node(j)); if (!sg) return -ENOMEM; =20 @@ -2067,7 +2070,8 @@ static int __sdt_alloc(const struct cpumask *cpu_map) *per_cpu_ptr(sdd->sg, j) =3D sg; =20 sgc =3D kzalloc_node(sizeof(struct sched_group_capacity) + cpumask_size= (), - GFP_KERNEL, cpu_to_node(j)); + GFP_KERNEL | __GFP_GLOBAL_NONSENSITIVE, + cpu_to_node(j)); if (!sgc) return -ENOMEM; =20 diff --git a/kernel/smp.c b/kernel/smp.c index 3c1b328f0a09..db9ab5a58e2c 100644 --- a/kernel/smp.c +++ b/kernel/smp.c @@ -103,15 +103,18 @@ int smpcfd_prepare_cpu(unsigned int cpu) { struct call_function_data *cfd =3D &per_cpu(cfd_data, cpu); =20 - if (!zalloc_cpumask_var_node(&cfd->cpumask, GFP_KERNEL, + if (!zalloc_cpumask_var_node(&cfd->cpumask, + GFP_KERNEL | __GFP_GLOBAL_NONSENSITIVE, cpu_to_node(cpu))) return -ENOMEM; - if (!zalloc_cpumask_var_node(&cfd->cpumask_ipi, GFP_KERNEL, + if (!zalloc_cpumask_var_node(&cfd->cpumask_ipi, + GFP_KERNEL | __GFP_GLOBAL_NONSENSITIVE, cpu_to_node(cpu))) { free_cpumask_var(cfd->cpumask); return -ENOMEM; } - cfd->pcpu =3D alloc_percpu(struct cfd_percpu); + cfd->pcpu =3D alloc_percpu_gfp(struct cfd_percpu, + GFP_KERNEL | __GFP_GLOBAL_NONSENSITIVE); if (!cfd->pcpu) { free_cpumask_var(cfd->cpumask); free_cpumask_var(cfd->cpumask_ipi); @@ -179,10 +182,10 @@ static int __init csdlock_debug(char *str) } early_param("csdlock_debug", csdlock_debug); =20 -static DEFINE_PER_CPU(call_single_data_t *, cur_csd); -static DEFINE_PER_CPU(smp_call_func_t, cur_csd_func); -static DEFINE_PER_CPU(void *, cur_csd_info); -static DEFINE_PER_CPU(struct cfd_seq_local, cfd_seq_local); +static DEFINE_PER_CPU_ASI_NOT_SENSITIVE(call_single_data_t *, cur_csd); +static DEFINE_PER_CPU_ASI_NOT_SENSITIVE(smp_call_func_t, cur_csd_func); +static DEFINE_PER_CPU_ASI_NOT_SENSITIVE(void *, cur_csd_info); +static DEFINE_PER_CPU_ASI_NOT_SENSITIVE(struct cfd_seq_local, cfd_seq_loca= l); =20 #define CSD_LOCK_TIMEOUT (5ULL * NSEC_PER_SEC) static atomic_t csd_bug_count =3D ATOMIC_INIT(0); diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index 2699e9e562b1..9ad7d4569d4b 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -1539,7 +1539,8 @@ static int __rb_allocate_pages(struct ring_buffer_per= _cpu *cpu_buffer, * gracefully without invoking oom-killer and the system is not * destabilized. */ - mflags =3D GFP_KERNEL | __GFP_RETRY_MAYFAIL; + /* TODO(oweisse): this is a hack to enable ASI tracing. */ + mflags =3D GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_GLOBAL_NONSENSITIVE; =20 /* * If a user thread allocates too much, and si_mem_available() @@ -1718,7 +1719,7 @@ struct trace_buffer *__ring_buffer_alloc(unsigned lon= g size, unsigned flags, =20 /* keep it in its own cache line */ buffer =3D kzalloc(ALIGN(sizeof(*buffer), cache_line_size()), - GFP_KERNEL); + GFP_KERNEL | __GFP_GLOBAL_NONSENSITIVE); if (!buffer) return NULL; =20 diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c index 64ea283f2f86..0ae6c38ee121 100644 --- a/kernel/tracepoint.c +++ b/kernel/tracepoint.c @@ -107,7 +107,7 @@ static void tp_stub_func(void) static inline void *allocate_probes(int count) { struct tp_probes *p =3D kmalloc(struct_size(p, probes, count), - GFP_KERNEL); + GFP_KERNEL | __GFP_GLOBAL_NONSENSITIVE); return p =3D=3D NULL ? NULL : p->probes; } =20 diff --git a/lib/radix-tree.c b/lib/radix-tree.c index b3afafe46fff..c7d3342a7b30 100644 --- a/lib/radix-tree.c +++ b/lib/radix-tree.c @@ -248,8 +248,7 @@ radix_tree_node_alloc(gfp_t gfp_mask, struct radix_tree= _node *parent, * cache first for the new node to get accounted to the memory * cgroup. */ - ret =3D kmem_cache_alloc(radix_tree_node_cachep, - gfp_mask | __GFP_NOWARN); + ret =3D kmem_cache_alloc(radix_tree_node_cachep, gfp_mask | __GFP_NOWARN= ); if (ret) goto out; =20 @@ -1597,9 +1596,10 @@ void __init radix_tree_init(void) BUILD_BUG_ON(RADIX_TREE_MAX_TAGS + __GFP_BITS_SHIFT > 32); BUILD_BUG_ON(ROOT_IS_IDR & ~GFP_ZONEMASK); BUILD_BUG_ON(XA_CHUNK_SIZE > 255); + /*TODO: (oweisse) ASI add SLAB_NONSENSITIVE */ radix_tree_node_cachep =3D kmem_cache_create("radix_tree_node", sizeof(struct radix_tree_node), 0, - SLAB_PANIC | SLAB_RECLAIM_ACCOUNT, + SLAB_PANIC | SLAB_RECLAIM_ACCOUNT | SLAB_GLOBAL_NONSENSITIVE, radix_tree_node_ctor); ret =3D cpuhp_setup_state_nocalls(CPUHP_RADIX_DEAD, "lib/radix:dead", NULL, radix_tree_cpu_dead); diff --git a/mm/memcontrol.c b/mm/memcontrol.c index a66d6b222ecf..fbc42e96b157 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -5143,20 +5143,21 @@ static struct mem_cgroup *mem_cgroup_alloc(void) size =3D sizeof(struct mem_cgroup); size +=3D nr_node_ids * sizeof(struct mem_cgroup_per_node *); =20 - memcg =3D kzalloc(size, GFP_KERNEL); + memcg =3D kzalloc(size, GFP_KERNEL | __GFP_GLOBAL_NONSENSITIVE); if (!memcg) return ERR_PTR(error); =20 memcg->id.id =3D idr_alloc(&mem_cgroup_idr, NULL, 1, MEM_CGROUP_ID_MAX, - GFP_KERNEL); + GFP_KERNEL | __GFP_GLOBAL_NONSENSITIVE); if (memcg->id.id < 0) { error =3D memcg->id.id; goto fail; } =20 memcg->vmstats_percpu =3D alloc_percpu_gfp(struct memcg_vmstats_percpu, - GFP_KERNEL_ACCOUNT); + GFP_KERNEL_ACCOUNT | + __GFP_GLOBAL_NONSENSITIVE= ); if (!memcg->vmstats_percpu) goto fail; =20 diff --git a/mm/util.c b/mm/util.c index 741ba32a43ac..0a49e15a0765 100644 --- a/mm/util.c +++ b/mm/util.c @@ -196,7 +196,8 @@ void *vmemdup_user(const void __user *src, size_t len) { void *p; =20 - p =3D kvmalloc(len, GFP_USER); + /* TODO(oweisse): is this secure? */ + p =3D kvmalloc(len, GFP_USER | __GFP_LOCAL_NONSENSITIVE); if (!p) return ERR_PTR(-ENOMEM); =20 diff --git a/mm/vmalloc.c b/mm/vmalloc.c index a89866a926f6..659560f286b0 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -3309,7 +3309,8 @@ EXPORT_SYMBOL(vzalloc); void *vmalloc_user(unsigned long size) { return __vmalloc_node_range(size, SHMLBA, VMALLOC_START, VMALLOC_END, - GFP_KERNEL | __GFP_ZERO, PAGE_KERNEL, + GFP_KERNEL | __GFP_ZERO + | __GFP_LOCAL_NONSENSITIVE, PAGE_KERNE= L, VM_USERMAP, NUMA_NO_NODE, __builtin_return_address(0)); } diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 909db87d7383..ce8c331386fb 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -404,7 +404,7 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gf= p_mask, ? skbuff_fclone_cache : skbuff_head_cache; =20 if (sk_memalloc_socks() && (flags & SKB_ALLOC_RX)) - gfp_mask |=3D __GFP_MEMALLOC; + gfp_mask |=3D __GFP_MEMALLOC | __GFP_GLOBAL_NONSENSITIVE; =20 /* Get the HEAD */ if ((flags & (SKB_ALLOC_FCLONE | SKB_ALLOC_NAPI)) =3D=3D SKB_ALLOC_NAPI && diff --git a/net/core/sock.c b/net/core/sock.c index 41e91d0f7061..6f6e0bd5ebf1 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -2704,7 +2704,7 @@ bool skb_page_frag_refill(unsigned int sz, struct pag= e_frag *pfrag, gfp_t gfp) /* Avoid direct reclaim but allow kswapd to wake */ pfrag->page =3D alloc_pages((gfp & ~__GFP_DIRECT_RECLAIM) | __GFP_COMP | __GFP_NOWARN | - __GFP_NORETRY, + __GFP_NORETRY | __GFP_GLOBAL_NONSENSITIVE, SKB_FRAG_PAGE_ORDER); if (likely(pfrag->page)) { pfrag->size =3D PAGE_SIZE << SKB_FRAG_PAGE_ORDER; diff --git a/virt/kvm/coalesced_mmio.c b/virt/kvm/coalesced_mmio.c index 0be80c213f7f..5b87476566c4 100644 --- a/virt/kvm/coalesced_mmio.c +++ b/virt/kvm/coalesced_mmio.c @@ -111,7 +111,7 @@ int kvm_coalesced_mmio_init(struct kvm *kvm) { struct page *page; =20 - page =3D alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO); + page =3D alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO | __GFP_LOCAL_NONSENS= ITIVE); if (!page) return -ENOMEM; =20 diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c index 2ad013b8bde9..40acb841135c 100644 --- a/virt/kvm/eventfd.c +++ b/virt/kvm/eventfd.c @@ -306,7 +306,8 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *arg= s) if (!kvm_arch_irqfd_allowed(kvm, args)) return -EINVAL; =20 - irqfd =3D kzalloc(sizeof(*irqfd), GFP_KERNEL_ACCOUNT); + irqfd =3D kzalloc(sizeof(*irqfd), + GFP_KERNEL_ACCOUNT | __GFP_GLOBAL_NONSENSITIVE); if (!irqfd) return -ENOMEM; =20 @@ -813,7 +814,7 @@ static int kvm_assign_ioeventfd_idx(struct kvm *kvm, if (IS_ERR(eventfd)) return PTR_ERR(eventfd); =20 - p =3D kzalloc(sizeof(*p), GFP_KERNEL_ACCOUNT); + p =3D kzalloc(sizeof(*p), GFP_KERNEL_ACCOUNT | __GFP_GLOBAL_NONSENSITIVE); if (!p) { ret =3D -ENOMEM; goto fail; diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 8d2d76de5bd0..587a75428da8 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -370,6 +370,9 @@ static inline void *mmu_memory_cache_alloc_obj(struct k= vm_mmu_memory_cache *mc, gfp_t gfp_flags) { gfp_flags |=3D mc->gfp_zero; +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + gfp_flags |=3D mc->gfp_asi; +#endif =20 if (mc->kmem_cache) return kmem_cache_alloc(mc->kmem_cache, gfp_flags); @@ -863,7 +866,8 @@ static struct kvm_memslots *kvm_alloc_memslots(void) int i; struct kvm_memslots *slots; =20 - slots =3D kvzalloc(sizeof(struct kvm_memslots), GFP_KERNEL_ACCOUNT); + slots =3D kvzalloc(sizeof(struct kvm_memslots), + GFP_KERNEL_ACCOUNT | __GFP_LOCAL_NONSENSITIVE); if (!slots) return NULL; =20 @@ -1529,7 +1533,7 @@ static struct kvm_memslots *kvm_dup_memslots(struct k= vm_memslots *old, else new_size =3D kvm_memslots_size(old->used_slots); =20 - slots =3D kvzalloc(new_size, GFP_KERNEL_ACCOUNT); + slots =3D kvzalloc(new_size, GFP_KERNEL_ACCOUNT | __GFP_LOCAL_NONSENSITIV= E); if (likely(slots)) kvm_copy_memslots(slots, old); =20 @@ -3565,7 +3569,7 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, = u32 id) } =20 BUILD_BUG_ON(sizeof(struct kvm_run) > PAGE_SIZE); - page =3D alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO); + page =3D alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO | __GFP_LOCAL_NONSENS= ITIVE); if (!page) { r =3D -ENOMEM; goto vcpu_free; @@ -4959,7 +4963,7 @@ int kvm_io_bus_register_dev(struct kvm *kvm, enum kvm= _bus bus_idx, gpa_t addr, return -ENOSPC; =20 new_bus =3D kmalloc(struct_size(bus, range, bus->dev_count + 1), - GFP_KERNEL_ACCOUNT); + GFP_KERNEL_ACCOUNT | __GFP_LOCAL_NONSENSITIVE); if (!new_bus) return -ENOMEM; =20 --=20 2.35.1.473.g83b2b277ed-goog From nobody Fri Jun 26 13:44:01 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1B38CC433F5 for ; Wed, 23 Feb 2022 05:28:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238440AbiBWF3L (ORCPT ); Wed, 23 Feb 2022 00:29:11 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57700 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238466AbiBWF1U (ORCPT ); Wed, 23 Feb 2022 00:27:20 -0500 Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com [IPv6:2607:f8b0:4864:20::114a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 40C066EB0C for ; Tue, 22 Feb 2022 21:25:32 -0800 (PST) Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-2d7b96d74f8so26402897b3.16 for ; Tue, 22 Feb 2022 21:25:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=3oY760HlCqyP72oYrU3xHrMJFplDw11mb6Xbtw2c1Ls=; b=D9vuMKstkbgF/aCBQRDDYJkPE6oiAauxsU3dhO3caTQi1Ors7zFJtimuNwMlMum/ya fAcTI4uIzYyzpdE2ZrEf/e06dc5vqx0LmPw1pv8n1QaD5XqmpCV/TIWB6E1NCWUBPvjx kP+mi1IO7e5PuquN9CtvonrWLH58I2l06WKRG94h1m346iPt6qcvJobAmVLWvT/Por/W KXklWi6I38xsiROhwJwrwH5FvLqV78uNDxYi99AIQIOYR6bJ6jvCc3gKNOAsKyQCZ1fM DYgbo1P+vArTAQISb6AwV92QkktvbooeM3ZvW1x8EQlFOcrk6E/f/85+B3yFEDRNVe/6 OS4w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=3oY760HlCqyP72oYrU3xHrMJFplDw11mb6Xbtw2c1Ls=; b=sQH53AeFlbdvLx4iXZo5c3mfi5TFZoJvPUFXZcxWsE2hajJ2D0+vKpRAMbKNFnCgG8 ea8q8DqhWIuzqwJJm3296tHjwetTpQSn1Y979SbEUkxDvXh4MXqIL0K0a6B52aUK8m2K jG26+4ocxTHi5dJ0AkCsDGtkqxs+vKvPyVkDWiVBOKPgnLgkwENxsXvpr8zw5GfgP25+ ncpIVjQlAE7vgRQmSOTpfacLCaPmiXwG7iF+c2hZay08FoptNVHmvuftMrfoj7WMfYUb MTK/pu+UAKhH+m2kT9d7pPuwHL4IGqxvWmAiIQCTKFLK8t0p/M6JI7zdDv+EI50gQfjl ClaA== X-Gm-Message-State: AOAM532c9Vhbi/cFJIbRRQTbKFO0ovo43EqzMOCO5s/zX8+LwJ7UgWjZ X0Lmnn4bKcAzcgq8M9k0HjyLbIHiOs9i8FUlc5WATUpU0rkfU78q8WirkZ6MHVk0IABwPz6M7l6 jjjibwDB5FsCYW7aT1+KKwAFpqhsuxVwLbkLqhdJ69cRfha3IFtcMxegDJXmJclRwmSevaRlN X-Google-Smtp-Source: ABdhPJwRtpaU+ffn34I6BuLI3XRUIkza37lNEpNOPEWvhLHQmrQIvwguXasM2lDIKujjHT1vMgrO3eyensjW X-Received: from js-desktop.svl.corp.google.com ([2620:15c:2cd:202:ccbe:5d15:e2e6:322]) (user=junaids job=sendgmr) by 2002:a81:1186:0:b0:2d6:a30d:fd86 with SMTP id 128-20020a811186000000b002d6a30dfd86mr26352453ywr.160.1645593924065; Tue, 22 Feb 2022 21:25:24 -0800 (PST) Date: Tue, 22 Feb 2022 21:22:20 -0800 In-Reply-To: <20220223052223.1202152-1-junaids@google.com> Message-Id: <20220223052223.1202152-45-junaids@google.com> Mime-Version: 1.0 References: <20220223052223.1202152-1-junaids@google.com> X-Mailer: git-send-email 2.35.1.473.g83b2b277ed-goog Subject: [RFC PATCH 44/47] kvm: asi: Splitting kvm_vcpu_arch into non/sensitive parts From: Junaid Shahid To: linux-kernel@vger.kernel.org Cc: Ofir Weisse , kvm@vger.kernel.org, pbonzini@redhat.com, jmattson@google.com, pjt@google.com, alexandre.chartre@oracle.com, rppt@linux.ibm.com, dave.hansen@linux.intel.com, peterz@infradead.org, tglx@linutronix.de, luto@kernel.org, linux-mm@kvack.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Ofir Weisse The part that was allocated via ASI LOCAL SENSITIVE is in `struct kvm_vcpu_arch_private`. The rest is in `struct kvm_vcpu_arch`. The latter contains a pointer `private` which is allocated to be ASI non-sensitive from a cache. Signed-off-by: Ofir Weisse --- arch/x86/include/asm/kvm_host.h | 109 ++++++++++++---------- arch/x86/kvm/cpuid.c | 14 +-- arch/x86/kvm/kvm_cache_regs.h | 22 ++--- arch/x86/kvm/mmu.h | 10 +- arch/x86/kvm/mmu/mmu.c | 138 +++++++++++++-------------- arch/x86/kvm/mmu/mmu_internal.h | 2 +- arch/x86/kvm/mmu/paging_tmpl.h | 26 +++--- arch/x86/kvm/mmu/spte.c | 4 +- arch/x86/kvm/mmu/tdp_mmu.c | 14 +-- arch/x86/kvm/svm/nested.c | 34 +++---- arch/x86/kvm/svm/sev.c | 70 +++++++------- arch/x86/kvm/svm/svm.c | 52 +++++------ arch/x86/kvm/trace.h | 10 +- arch/x86/kvm/vmx/nested.c | 68 +++++++------- arch/x86/kvm/vmx/vmx.c | 64 ++++++------- arch/x86/kvm/x86.c | 160 ++++++++++++++++---------------- arch/x86/kvm/x86.h | 2 +- virt/kvm/kvm_main.c | 38 ++++++-- 18 files changed, 436 insertions(+), 401 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index 34a05add5e77..d7315f86f85c 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -606,14 +606,12 @@ struct kvm_vcpu_xen { u64 runstate_times[4]; }; =20 -struct kvm_vcpu_arch { - /* +struct kvm_vcpu_arch_private { + /* * rip and regs accesses must go through * kvm_{register,rip}_{read,write} functions. */ unsigned long regs[NR_VCPU_REGS]; - u32 regs_avail; - u32 regs_dirty; =20 unsigned long cr0; unsigned long cr0_guest_owned_bits; @@ -623,6 +621,63 @@ struct kvm_vcpu_arch { unsigned long cr4_guest_owned_bits; unsigned long cr4_guest_rsvd_bits; unsigned long cr8; + + /* + * QEMU userspace and the guest each have their own FPU state. + * In vcpu_run, we switch between the user and guest FPU contexts. + * While running a VCPU, the VCPU thread will have the guest FPU + * context. + * + * Note that while the PKRU state lives inside the fpu registers, + * it is switched out separately at VMENTER and VMEXIT time. The + * "guest_fpstate" state here contains the guest FPU context, with the + * host PRKU bits. + */ + struct fpu_guest guest_fpu; + + u64 xcr0; + u64 guest_supported_xcr0; + + /* + * Paging state of the vcpu + * + * If the vcpu runs in guest mode with two level paging this still saves + * the paging mode of the l1 guest. This context is always used to + * handle faults. + */ + struct kvm_mmu *mmu; + + /* Non-nested MMU for L1 */ + struct kvm_mmu root_mmu; + + /* L1 MMU when running nested */ + struct kvm_mmu guest_mmu; + + /* + * Pointer to the mmu context currently used for + * gva_to_gpa translations. + */ + struct kvm_mmu *walk_mmu; + + /* + * Paging state of an L2 guest (used for nested npt) + * + * This context will save all necessary information to walk page tables + * of an L2 guest. This context is only initialized for page table + * walking and not for faulting since we never handle l2 page faults on + * the host. + */ + struct kvm_mmu nested_mmu; + + struct x86_emulate_ctxt *emulate_ctxt; +}; + +struct kvm_vcpu_arch { + struct kvm_vcpu_arch_private *private; +=09 + u32 regs_avail; + u32 regs_dirty; + u32 host_pkru; u32 pkru; u32 hflags; @@ -645,36 +700,6 @@ struct kvm_vcpu_arch { u64 arch_capabilities; u64 perf_capabilities; =20 - /* - * Paging state of the vcpu - * - * If the vcpu runs in guest mode with two level paging this still saves - * the paging mode of the l1 guest. This context is always used to - * handle faults. - */ - struct kvm_mmu *mmu; - - /* Non-nested MMU for L1 */ - struct kvm_mmu root_mmu; - - /* L1 MMU when running nested */ - struct kvm_mmu guest_mmu; - - /* - * Paging state of an L2 guest (used for nested npt) - * - * This context will save all necessary information to walk page tables - * of an L2 guest. This context is only initialized for page table - * walking and not for faulting since we never handle l2 page faults on - * the host. - */ - struct kvm_mmu nested_mmu; - - /* - * Pointer to the mmu context currently used for - * gva_to_gpa translations. - */ - struct kvm_mmu *walk_mmu; =20 struct kvm_mmu_memory_cache mmu_pte_list_desc_cache; struct kvm_mmu_memory_cache mmu_shadow_page_cache; @@ -683,21 +708,6 @@ struct kvm_vcpu_arch { =20 struct asi_pgtbl_pool asi_pgtbl_pool; =20 - /* - * QEMU userspace and the guest each have their own FPU state. - * In vcpu_run, we switch between the user and guest FPU contexts. - * While running a VCPU, the VCPU thread will have the guest FPU - * context. - * - * Note that while the PKRU state lives inside the fpu registers, - * it is switched out separately at VMENTER and VMEXIT time. The - * "guest_fpstate" state here contains the guest FPU context, with the - * host PRKU bits. - */ - struct fpu_guest guest_fpu; - - u64 xcr0; - u64 guest_supported_xcr0; =20 struct kvm_pio_request pio; void *pio_data; @@ -734,7 +744,6 @@ struct kvm_vcpu_arch { =20 /* emulate context */ =20 - struct x86_emulate_ctxt *emulate_ctxt; bool emulate_regs_need_sync_to_vcpu; bool emulate_regs_need_sync_from_vcpu; int (*complete_userspace_io)(struct kvm_vcpu *vcpu); diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index dedabfdd292e..7192cbe06ba3 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -169,12 +169,12 @@ void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu) =20 best =3D kvm_find_cpuid_entry(vcpu, 0xD, 0); if (best) - best->ebx =3D xstate_required_size(vcpu->arch.xcr0, false); + best->ebx =3D xstate_required_size(vcpu->arch.private->xcr0, false); =20 best =3D kvm_find_cpuid_entry(vcpu, 0xD, 1); if (best && (cpuid_entry_has(best, X86_FEATURE_XSAVES) || cpuid_entry_has(best, X86_FEATURE_XSAVEC))) - best->ebx =3D xstate_required_size(vcpu->arch.xcr0, true); + best->ebx =3D xstate_required_size(vcpu->arch.private->xcr0, true); =20 best =3D kvm_find_kvm_cpuid_features(vcpu); if (kvm_hlt_in_guest(vcpu->kvm) && best && @@ -208,9 +208,9 @@ static void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *v= cpu) =20 best =3D kvm_find_cpuid_entry(vcpu, 0xD, 0); if (!best) - vcpu->arch.guest_supported_xcr0 =3D 0; + vcpu->arch.private->guest_supported_xcr0 =3D 0; else - vcpu->arch.guest_supported_xcr0 =3D + vcpu->arch.private->guest_supported_xcr0 =3D (best->eax | ((u64)best->edx << 32)) & supported_xcr0; =20 /* @@ -223,8 +223,8 @@ static void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *v= cpu) */ best =3D kvm_find_cpuid_entry(vcpu, 0x12, 0x1); if (best) { - best->ecx &=3D vcpu->arch.guest_supported_xcr0 & 0xffffffff; - best->edx &=3D vcpu->arch.guest_supported_xcr0 >> 32; + best->ecx &=3D vcpu->arch.private->guest_supported_xcr0 & 0xffffffff; + best->edx &=3D vcpu->arch.private->guest_supported_xcr0 >> 32; best->ecx |=3D XFEATURE_MASK_FPSSE; } =20 @@ -234,7 +234,7 @@ static void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *v= cpu) vcpu->arch.reserved_gpa_bits =3D kvm_vcpu_reserved_gpa_bits_raw(vcpu); =20 kvm_pmu_refresh(vcpu); - vcpu->arch.cr4_guest_rsvd_bits =3D + vcpu->arch.private->cr4_guest_rsvd_bits =3D __cr4_reserved_bits(guest_cpuid_has, vcpu); =20 kvm_hv_set_cpuid(vcpu); diff --git a/arch/x86/kvm/kvm_cache_regs.h b/arch/x86/kvm/kvm_cache_regs.h index 90e1ffdc05b7..592780402160 100644 --- a/arch/x86/kvm/kvm_cache_regs.h +++ b/arch/x86/kvm/kvm_cache_regs.h @@ -12,12 +12,12 @@ #define BUILD_KVM_GPR_ACCESSORS(lname, uname) \ static __always_inline unsigned long kvm_##lname##_read(struct kvm_vcpu *v= cpu)\ { \ - return vcpu->arch.regs[VCPU_REGS_##uname]; \ + return vcpu->arch.private->regs[VCPU_REGS_##uname]; \ } \ static __always_inline void kvm_##lname##_write(struct kvm_vcpu *vcpu, = \ unsigned long val) \ { \ - vcpu->arch.regs[VCPU_REGS_##uname] =3D val; \ + vcpu->arch.private->regs[VCPU_REGS_##uname] =3D val; \ } BUILD_KVM_GPR_ACCESSORS(rax, RAX) BUILD_KVM_GPR_ACCESSORS(rbx, RBX) @@ -82,7 +82,7 @@ static inline unsigned long kvm_register_read_raw(struct = kvm_vcpu *vcpu, int reg if (!kvm_register_is_available(vcpu, reg)) static_call(kvm_x86_cache_reg)(vcpu, reg); =20 - return vcpu->arch.regs[reg]; + return vcpu->arch.private->regs[reg]; } =20 static inline void kvm_register_write_raw(struct kvm_vcpu *vcpu, int reg, @@ -91,7 +91,7 @@ static inline void kvm_register_write_raw(struct kvm_vcpu= *vcpu, int reg, if (WARN_ON_ONCE((unsigned int)reg >=3D NR_VCPU_REGS)) return; =20 - vcpu->arch.regs[reg] =3D val; + vcpu->arch.private->regs[reg] =3D val; kvm_register_mark_dirty(vcpu, reg); } =20 @@ -122,21 +122,21 @@ static inline u64 kvm_pdptr_read(struct kvm_vcpu *vcp= u, int index) if (!kvm_register_is_available(vcpu, VCPU_EXREG_PDPTR)) static_call(kvm_x86_cache_reg)(vcpu, VCPU_EXREG_PDPTR); =20 - return vcpu->arch.walk_mmu->pdptrs[index]; + return vcpu->arch.private->walk_mmu->pdptrs[index]; } =20 static inline void kvm_pdptr_write(struct kvm_vcpu *vcpu, int index, u64 v= alue) { - vcpu->arch.walk_mmu->pdptrs[index] =3D value; + vcpu->arch.private->walk_mmu->pdptrs[index] =3D value; } =20 static inline ulong kvm_read_cr0_bits(struct kvm_vcpu *vcpu, ulong mask) { ulong tmask =3D mask & KVM_POSSIBLE_CR0_GUEST_BITS; - if ((tmask & vcpu->arch.cr0_guest_owned_bits) && + if ((tmask & vcpu->arch.private->cr0_guest_owned_bits) && !kvm_register_is_available(vcpu, VCPU_EXREG_CR0)) static_call(kvm_x86_cache_reg)(vcpu, VCPU_EXREG_CR0); - return vcpu->arch.cr0 & mask; + return vcpu->arch.private->cr0 & mask; } =20 static inline ulong kvm_read_cr0(struct kvm_vcpu *vcpu) @@ -147,17 +147,17 @@ static inline ulong kvm_read_cr0(struct kvm_vcpu *vcp= u) static inline ulong kvm_read_cr4_bits(struct kvm_vcpu *vcpu, ulong mask) { ulong tmask =3D mask & KVM_POSSIBLE_CR4_GUEST_BITS; - if ((tmask & vcpu->arch.cr4_guest_owned_bits) && + if ((tmask & vcpu->arch.private->cr4_guest_owned_bits) && !kvm_register_is_available(vcpu, VCPU_EXREG_CR4)) static_call(kvm_x86_cache_reg)(vcpu, VCPU_EXREG_CR4); - return vcpu->arch.cr4 & mask; + return vcpu->arch.private->cr4 & mask; } =20 static inline ulong kvm_read_cr3(struct kvm_vcpu *vcpu) { if (!kvm_register_is_available(vcpu, VCPU_EXREG_CR3)) static_call(kvm_x86_cache_reg)(vcpu, VCPU_EXREG_CR3); - return vcpu->arch.cr3; + return vcpu->arch.private->cr3; } =20 static inline ulong kvm_read_cr4(struct kvm_vcpu *vcpu) diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h index 60b84331007d..aea21355580d 100644 --- a/arch/x86/kvm/mmu.h +++ b/arch/x86/kvm/mmu.h @@ -89,7 +89,7 @@ void kvm_mmu_sync_prev_roots(struct kvm_vcpu *vcpu); =20 static inline int kvm_mmu_reload(struct kvm_vcpu *vcpu) { - if (likely(vcpu->arch.mmu->root_hpa !=3D INVALID_PAGE)) + if (likely(vcpu->arch.private->mmu->root_hpa !=3D INVALID_PAGE)) return 0; =20 return kvm_mmu_load(vcpu); @@ -111,13 +111,13 @@ static inline unsigned long kvm_get_active_pcid(struc= t kvm_vcpu *vcpu) =20 static inline void kvm_mmu_load_pgd(struct kvm_vcpu *vcpu) { - u64 root_hpa =3D vcpu->arch.mmu->root_hpa; + u64 root_hpa =3D vcpu->arch.private->mmu->root_hpa; =20 if (!VALID_PAGE(root_hpa)) return; =20 static_call(kvm_x86_load_mmu_pgd)(vcpu, root_hpa, - vcpu->arch.mmu->shadow_root_level); + vcpu->arch.private->mmu->shadow_root_level); } =20 struct kvm_page_fault { @@ -193,7 +193,7 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu= *vcpu, gpa_t cr2_or_gpa, .rsvd =3D err & PFERR_RSVD_MASK, .user =3D err & PFERR_USER_MASK, .prefetch =3D prefetch, - .is_tdp =3D likely(vcpu->arch.mmu->page_fault =3D=3D kvm_tdp_page_fault), + .is_tdp =3D likely(vcpu->arch.private->mmu->page_fault =3D=3D kvm_tdp_pa= ge_fault), .nx_huge_page_workaround_enabled =3D is_nx_huge_page_enabled(), =20 .max_level =3D KVM_MAX_HUGEPAGE_LEVEL, @@ -204,7 +204,7 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu= *vcpu, gpa_t cr2_or_gpa, if (fault.is_tdp) return kvm_tdp_page_fault(vcpu, &fault); #endif - return vcpu->arch.mmu->page_fault(vcpu, &fault); + return vcpu->arch.private->mmu->page_fault(vcpu, &fault); } =20 /* diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index a2ada1104c2d..e36171f69b8e 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -704,7 +704,7 @@ static bool mmu_spte_age(u64 *sptep) =20 static void walk_shadow_page_lockless_begin(struct kvm_vcpu *vcpu) { - if (is_tdp_mmu(vcpu->arch.mmu)) { + if (is_tdp_mmu(vcpu->arch.private->mmu)) { kvm_tdp_mmu_walk_lockless_begin(); } else { /* @@ -723,7 +723,7 @@ static void walk_shadow_page_lockless_begin(struct kvm_= vcpu *vcpu) =20 static void walk_shadow_page_lockless_end(struct kvm_vcpu *vcpu) { - if (is_tdp_mmu(vcpu->arch.mmu)) { + if (is_tdp_mmu(vcpu->arch.private->mmu)) { kvm_tdp_mmu_walk_lockless_end(); } else { /* @@ -1909,7 +1909,7 @@ static void kvm_mmu_commit_zap_page(struct kvm *kvm, static bool kvm_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, struct list_head *invalid_list) { - int ret =3D vcpu->arch.mmu->sync_page(vcpu, sp); + int ret =3D vcpu->arch.private->mmu->sync_page(vcpu, sp); =20 if (ret < 0) { kvm_mmu_prepare_zap_page(vcpu->kvm, sp, invalid_list); @@ -2081,7 +2081,7 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct k= vm_vcpu *vcpu, int direct, unsigned int access) { - bool direct_mmu =3D vcpu->arch.mmu->direct_map; + bool direct_mmu =3D vcpu->arch.private->mmu->direct_map; union kvm_mmu_page_role role; struct hlist_head *sp_list; unsigned quadrant; @@ -2089,13 +2089,13 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct= kvm_vcpu *vcpu, int collisions =3D 0; LIST_HEAD(invalid_list); =20 - role =3D vcpu->arch.mmu->mmu_role.base; + role =3D vcpu->arch.private->mmu->mmu_role.base; role.level =3D level; role.direct =3D direct; if (role.direct) role.gpte_is_8_bytes =3D true; role.access =3D access; - if (!direct_mmu && vcpu->arch.mmu->root_level <=3D PT32_ROOT_LEVEL) { + if (!direct_mmu && vcpu->arch.private->mmu->root_level <=3D PT32_ROOT_LEV= EL) { quadrant =3D gaddr >> (PAGE_SHIFT + (PT64_PT_BITS * level)); quadrant &=3D (1 << ((PT32_PT_BITS - PT64_PT_BITS) * level)) - 1; role.quadrant =3D quadrant; @@ -2181,11 +2181,11 @@ static void shadow_walk_init_using_root(struct kvm_= shadow_walk_iterator *iterato { iterator->addr =3D addr; iterator->shadow_addr =3D root; - iterator->level =3D vcpu->arch.mmu->shadow_root_level; + iterator->level =3D vcpu->arch.private->mmu->shadow_root_level; =20 if (iterator->level >=3D PT64_ROOT_4LEVEL && - vcpu->arch.mmu->root_level < PT64_ROOT_4LEVEL && - !vcpu->arch.mmu->direct_map) + vcpu->arch.private->mmu->root_level < PT64_ROOT_4LEVEL && + !vcpu->arch.private->mmu->direct_map) iterator->level =3D PT32E_ROOT_LEVEL; =20 if (iterator->level =3D=3D PT32E_ROOT_LEVEL) { @@ -2193,10 +2193,10 @@ static void shadow_walk_init_using_root(struct kvm_= shadow_walk_iterator *iterato * prev_root is currently only used for 64-bit hosts. So only * the active root_hpa is valid here. */ - BUG_ON(root !=3D vcpu->arch.mmu->root_hpa); + BUG_ON(root !=3D vcpu->arch.private->mmu->root_hpa); =20 iterator->shadow_addr - =3D vcpu->arch.mmu->pae_root[(addr >> 30) & 3]; + =3D vcpu->arch.private->mmu->pae_root[(addr >> 30) & 3]; iterator->shadow_addr &=3D PT64_BASE_ADDR_MASK; --iterator->level; if (!iterator->shadow_addr) @@ -2207,7 +2207,7 @@ static void shadow_walk_init_using_root(struct kvm_sh= adow_walk_iterator *iterato static void shadow_walk_init(struct kvm_shadow_walk_iterator *iterator, struct kvm_vcpu *vcpu, u64 addr) { - shadow_walk_init_using_root(iterator, vcpu, vcpu->arch.mmu->root_hpa, + shadow_walk_init_using_root(iterator, vcpu, vcpu->arch.private->mmu->root= _hpa, addr); } =20 @@ -2561,7 +2561,7 @@ static int kvm_mmu_unprotect_page_virt(struct kvm_vcp= u *vcpu, gva_t gva) gpa_t gpa; int r; =20 - if (vcpu->arch.mmu->direct_map) + if (vcpu->arch.private->mmu->direct_map) return 0; =20 gpa =3D kvm_mmu_gva_to_gpa_read(vcpu, gva, NULL); @@ -3186,7 +3186,7 @@ static int fast_page_fault(struct kvm_vcpu *vcpu, str= uct kvm_page_fault *fault) do { u64 new_spte; =20 - if (is_tdp_mmu(vcpu->arch.mmu)) + if (is_tdp_mmu(vcpu->arch.private->mmu)) sptep =3D kvm_tdp_mmu_fast_pf_get_last_sptep(vcpu, fault->addr, &spte); else sptep =3D fast_pf_get_last_sptep(vcpu, fault->addr, &spte); @@ -3393,7 +3393,7 @@ static hpa_t mmu_alloc_root(struct kvm_vcpu *vcpu, gf= n_t gfn, gva_t gva, =20 static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu) { - struct kvm_mmu *mmu =3D vcpu->arch.mmu; + struct kvm_mmu *mmu =3D vcpu->arch.private->mmu; u8 shadow_root_level =3D mmu->shadow_root_level; hpa_t root; unsigned i; @@ -3501,7 +3501,7 @@ static int mmu_first_shadow_root_alloc(struct kvm *kv= m) =20 static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu) { - struct kvm_mmu *mmu =3D vcpu->arch.mmu; + struct kvm_mmu *mmu =3D vcpu->arch.private->mmu; u64 pdptrs[4], pm_mask; gfn_t root_gfn, root_pgd; hpa_t root; @@ -3611,7 +3611,7 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vc= pu) =20 static int mmu_alloc_special_roots(struct kvm_vcpu *vcpu) { - struct kvm_mmu *mmu =3D vcpu->arch.mmu; + struct kvm_mmu *mmu =3D vcpu->arch.private->mmu; bool need_pml5 =3D mmu->shadow_root_level > PT64_ROOT_4LEVEL; u64 *pml5_root =3D NULL; u64 *pml4_root =3D NULL; @@ -3712,16 +3712,16 @@ void kvm_mmu_sync_roots(struct kvm_vcpu *vcpu) int i; struct kvm_mmu_page *sp; =20 - if (vcpu->arch.mmu->direct_map) + if (vcpu->arch.private->mmu->direct_map) return; =20 - if (!VALID_PAGE(vcpu->arch.mmu->root_hpa)) + if (!VALID_PAGE(vcpu->arch.private->mmu->root_hpa)) return; =20 vcpu_clear_mmio_info(vcpu, MMIO_GVA_ANY); =20 - if (vcpu->arch.mmu->root_level >=3D PT64_ROOT_4LEVEL) { - hpa_t root =3D vcpu->arch.mmu->root_hpa; + if (vcpu->arch.private->mmu->root_level >=3D PT64_ROOT_4LEVEL) { + hpa_t root =3D vcpu->arch.private->mmu->root_hpa; sp =3D to_shadow_page(root); =20 if (!is_unsync_root(root)) @@ -3741,7 +3741,7 @@ void kvm_mmu_sync_roots(struct kvm_vcpu *vcpu) kvm_mmu_audit(vcpu, AUDIT_PRE_SYNC); =20 for (i =3D 0; i < 4; ++i) { - hpa_t root =3D vcpu->arch.mmu->pae_root[i]; + hpa_t root =3D vcpu->arch.private->mmu->pae_root[i]; =20 if (IS_VALID_PAE_ROOT(root)) { root &=3D PT64_BASE_ADDR_MASK; @@ -3760,11 +3760,11 @@ void kvm_mmu_sync_prev_roots(struct kvm_vcpu *vcpu) int i; =20 for (i =3D 0; i < KVM_MMU_NUM_PREV_ROOTS; i++) - if (is_unsync_root(vcpu->arch.mmu->prev_roots[i].hpa)) + if (is_unsync_root(vcpu->arch.private->mmu->prev_roots[i].hpa)) roots_to_free |=3D KVM_MMU_ROOT_PREVIOUS(i); =20 /* sync prev_roots by simply freeing them */ - kvm_mmu_free_roots(vcpu, vcpu->arch.mmu, roots_to_free); + kvm_mmu_free_roots(vcpu, vcpu->arch.private->mmu, roots_to_free); } =20 static gpa_t nonpaging_gva_to_gpa(struct kvm_vcpu *vcpu, gpa_t vaddr, @@ -3781,7 +3781,7 @@ static gpa_t nonpaging_gva_to_gpa_nested(struct kvm_v= cpu *vcpu, gpa_t vaddr, { if (exception) exception->error_code =3D 0; - return vcpu->arch.nested_mmu.translate_gpa(vcpu, vaddr, access, exception= ); + return vcpu->arch.private->nested_mmu.translate_gpa(vcpu, vaddr, access, = exception); } =20 static bool mmio_info_in_cache(struct kvm_vcpu *vcpu, u64 addr, bool direc= t) @@ -3834,7 +3834,7 @@ static bool get_mmio_spte(struct kvm_vcpu *vcpu, u64 = addr, u64 *sptep) =20 walk_shadow_page_lockless_begin(vcpu); =20 - if (is_tdp_mmu(vcpu->arch.mmu)) + if (is_tdp_mmu(vcpu->arch.private->mmu)) leaf =3D kvm_tdp_mmu_get_walk(vcpu, addr, sptes, &root); else leaf =3D get_walk(vcpu, addr, sptes, &root); @@ -3857,7 +3857,7 @@ static bool get_mmio_spte(struct kvm_vcpu *vcpu, u64 = addr, u64 *sptep) if (!is_shadow_present_pte(sptes[leaf])) leaf++; =20 - rsvd_check =3D &vcpu->arch.mmu->shadow_zero_check; + rsvd_check =3D &vcpu->arch.private->mmu->shadow_zero_check; =20 for (level =3D root; level >=3D leaf; level--) reserved |=3D is_rsvd_spte(rsvd_check, sptes[level], level); @@ -3945,8 +3945,8 @@ static bool kvm_arch_setup_async_pf(struct kvm_vcpu *= vcpu, gpa_t cr2_or_gpa, =20 arch.token =3D (vcpu->arch.apf.id++ << 12) | vcpu->vcpu_id; arch.gfn =3D gfn; - arch.direct_map =3D vcpu->arch.mmu->direct_map; - arch.cr3 =3D vcpu->arch.mmu->get_guest_pgd(vcpu); + arch.direct_map =3D vcpu->arch.private->mmu->direct_map; + arch.cr3 =3D vcpu->arch.private->mmu->get_guest_pgd(vcpu); =20 return kvm_setup_async_pf(vcpu, cr2_or_gpa, kvm_vcpu_gfn_to_hva(vcpu, gfn), &arch); @@ -4029,7 +4029,7 @@ static void vcpu_fill_asi_pgtbl_pool(struct kvm_vcpu = *vcpu) static bool is_page_fault_stale(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault, int mmu_seq) { - struct kvm_mmu_page *sp =3D to_shadow_page(vcpu->arch.mmu->root_hpa); + struct kvm_mmu_page *sp =3D to_shadow_page(vcpu->arch.private->mmu->root_= hpa); =20 /* Special roots, e.g. pae_root, are not backed by shadow pages. */ if (sp && is_obsolete_sp(vcpu->kvm, sp)) @@ -4052,7 +4052,7 @@ static bool is_page_fault_stale(struct kvm_vcpu *vcpu, =20 static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault = *fault) { - bool is_tdp_mmu_fault =3D is_tdp_mmu(vcpu->arch.mmu); + bool is_tdp_mmu_fault =3D is_tdp_mmu(vcpu->arch.private->mmu); =20 unsigned long mmu_seq; bool try_asi_map; @@ -4206,7 +4206,7 @@ static bool cached_root_available(struct kvm_vcpu *vc= pu, gpa_t new_pgd, { uint i; struct kvm_mmu_root_info root; - struct kvm_mmu *mmu =3D vcpu->arch.mmu; + struct kvm_mmu *mmu =3D vcpu->arch.private->mmu; =20 root.pgd =3D mmu->root_pgd; root.hpa =3D mmu->root_hpa; @@ -4230,7 +4230,7 @@ static bool cached_root_available(struct kvm_vcpu *vc= pu, gpa_t new_pgd, static bool fast_pgd_switch(struct kvm_vcpu *vcpu, gpa_t new_pgd, union kvm_mmu_page_role new_role) { - struct kvm_mmu *mmu =3D vcpu->arch.mmu; + struct kvm_mmu *mmu =3D vcpu->arch.private->mmu; =20 /* * For now, limit the fast switch to 64-bit hosts+VMs in order to avoid @@ -4248,7 +4248,7 @@ static void __kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, = gpa_t new_pgd, union kvm_mmu_page_role new_role) { if (!fast_pgd_switch(vcpu, new_pgd, new_role)) { - kvm_mmu_free_roots(vcpu, vcpu->arch.mmu, KVM_MMU_ROOT_CURRENT); + kvm_mmu_free_roots(vcpu, vcpu->arch.private->mmu, KVM_MMU_ROOT_CURRENT); return; } =20 @@ -4279,7 +4279,7 @@ static void __kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, = gpa_t new_pgd, */ if (!new_role.direct) __clear_sp_write_flooding_count( - to_shadow_page(vcpu->arch.mmu->root_hpa)); + to_shadow_page(vcpu->arch.private->mmu->root_hpa)); } =20 void kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t new_pgd) @@ -4826,7 +4826,7 @@ kvm_calc_tdp_mmu_root_page_role(struct kvm_vcpu *vcpu, =20 static void init_kvm_tdp_mmu(struct kvm_vcpu *vcpu) { - struct kvm_mmu *context =3D &vcpu->arch.root_mmu; + struct kvm_mmu *context =3D &vcpu->arch.private->root_mmu; struct kvm_mmu_role_regs regs =3D vcpu_to_role_regs(vcpu); union kvm_mmu_role new_role =3D kvm_calc_tdp_mmu_root_page_role(vcpu, ®s, false); @@ -4914,7 +4914,7 @@ static void shadow_mmu_init_context(struct kvm_vcpu *= vcpu, struct kvm_mmu *conte static void kvm_init_shadow_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu_role_regs *regs) { - struct kvm_mmu *context =3D &vcpu->arch.root_mmu; + struct kvm_mmu *context =3D &vcpu->arch.private->root_mmu; union kvm_mmu_role new_role =3D kvm_calc_shadow_mmu_root_page_role(vcpu, regs, false); =20 @@ -4937,7 +4937,7 @@ kvm_calc_shadow_npt_root_page_role(struct kvm_vcpu *v= cpu, void kvm_init_shadow_npt_mmu(struct kvm_vcpu *vcpu, unsigned long cr0, unsigned long cr4, u64 efer, gpa_t nested_cr3) { - struct kvm_mmu *context =3D &vcpu->arch.guest_mmu; + struct kvm_mmu *context =3D &vcpu->arch.private->guest_mmu; struct kvm_mmu_role_regs regs =3D { .cr0 =3D cr0, .cr4 =3D cr4 & ~X86_CR4_PKE, @@ -4960,7 +4960,7 @@ kvm_calc_shadow_ept_root_page_role(struct kvm_vcpu *v= cpu, bool accessed_dirty, union kvm_mmu_role role =3D {0}; =20 /* SMM flag is inherited from root_mmu */ - role.base.smm =3D vcpu->arch.root_mmu.mmu_role.base.smm; + role.base.smm =3D vcpu->arch.private->root_mmu.mmu_role.base.smm; =20 role.base.level =3D level; role.base.gpte_is_8_bytes =3D true; @@ -4980,7 +4980,7 @@ kvm_calc_shadow_ept_root_page_role(struct kvm_vcpu *v= cpu, bool accessed_dirty, void kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, bool execonly, bool accessed_dirty, gpa_t new_eptp) { - struct kvm_mmu *context =3D &vcpu->arch.guest_mmu; + struct kvm_mmu *context =3D &vcpu->arch.private->guest_mmu; u8 level =3D vmx_eptp_page_walk_level(new_eptp); union kvm_mmu_role new_role =3D kvm_calc_shadow_ept_root_page_role(vcpu, accessed_dirty, @@ -5012,7 +5012,7 @@ EXPORT_SYMBOL_GPL(kvm_init_shadow_ept_mmu); =20 static void init_kvm_softmmu(struct kvm_vcpu *vcpu) { - struct kvm_mmu *context =3D &vcpu->arch.root_mmu; + struct kvm_mmu *context =3D &vcpu->arch.private->root_mmu; struct kvm_mmu_role_regs regs =3D vcpu_to_role_regs(vcpu); =20 kvm_init_shadow_mmu(vcpu, ®s); @@ -5043,7 +5043,7 @@ static void init_kvm_nested_mmu(struct kvm_vcpu *vcpu) { struct kvm_mmu_role_regs regs =3D vcpu_to_role_regs(vcpu); union kvm_mmu_role new_role =3D kvm_calc_nested_mmu_role(vcpu, ®s); - struct kvm_mmu *g_context =3D &vcpu->arch.nested_mmu; + struct kvm_mmu *g_context =3D &vcpu->arch.private->nested_mmu; =20 if (new_role.as_u64 =3D=3D g_context->mmu_role.as_u64) return; @@ -5061,9 +5061,9 @@ static void init_kvm_nested_mmu(struct kvm_vcpu *vcpu) g_context->invlpg =3D NULL; =20 /* - * Note that arch.mmu->gva_to_gpa translates l2_gpa to l1_gpa using + * Note that arch.private->mmu->gva_to_gpa translates l2_gpa to l1_gpa us= ing * L1's nested page tables (e.g. EPT12). The nested translation - * of l2_gva to l1_gpa is done by arch.nested_mmu.gva_to_gpa using + * of l2_gva to l1_gpa is done by arch.private->nested_mmu.gva_to_gpa usi= ng * L2's page tables as the first level of translation and L1's * nested page tables as the second level of translation. Basically * the gva_to_gpa functions between mmu and nested_mmu are swapped. @@ -5119,9 +5119,9 @@ void kvm_mmu_after_set_cpuid(struct kvm_vcpu *vcpu) * problem is swept under the rug; KVM's CPUID API is horrific and * it's all but impossible to solve it without introducing a new API. */ - vcpu->arch.root_mmu.mmu_role.ext.valid =3D 0; - vcpu->arch.guest_mmu.mmu_role.ext.valid =3D 0; - vcpu->arch.nested_mmu.mmu_role.ext.valid =3D 0; + vcpu->arch.private->root_mmu.mmu_role.ext.valid =3D 0; + vcpu->arch.private->guest_mmu.mmu_role.ext.valid =3D 0; + vcpu->arch.private->nested_mmu.mmu_role.ext.valid =3D 0; kvm_mmu_reset_context(vcpu); =20 /* @@ -5142,13 +5142,13 @@ int kvm_mmu_load(struct kvm_vcpu *vcpu) { int r; =20 - r =3D mmu_topup_memory_caches(vcpu, !vcpu->arch.mmu->direct_map); + r =3D mmu_topup_memory_caches(vcpu, !vcpu->arch.private->mmu->direct_map); if (r) goto out; r =3D mmu_alloc_special_roots(vcpu); if (r) goto out; - if (vcpu->arch.mmu->direct_map) + if (vcpu->arch.private->mmu->direct_map) r =3D mmu_alloc_direct_roots(vcpu); else r =3D mmu_alloc_shadow_roots(vcpu); @@ -5165,10 +5165,10 @@ int kvm_mmu_load(struct kvm_vcpu *vcpu) =20 void kvm_mmu_unload(struct kvm_vcpu *vcpu) { - kvm_mmu_free_roots(vcpu, &vcpu->arch.root_mmu, KVM_MMU_ROOTS_ALL); - WARN_ON(VALID_PAGE(vcpu->arch.root_mmu.root_hpa)); - kvm_mmu_free_roots(vcpu, &vcpu->arch.guest_mmu, KVM_MMU_ROOTS_ALL); - WARN_ON(VALID_PAGE(vcpu->arch.guest_mmu.root_hpa)); + kvm_mmu_free_roots(vcpu, &vcpu->arch.private->root_mmu, KVM_MMU_ROOTS_ALL= ); + WARN_ON(VALID_PAGE(vcpu->arch.private->root_mmu.root_hpa)); + kvm_mmu_free_roots(vcpu, &vcpu->arch.private->guest_mmu, KVM_MMU_ROOTS_AL= L); + WARN_ON(VALID_PAGE(vcpu->arch.private->guest_mmu.root_hpa)); } =20 static bool need_remote_flush(u64 old, u64 new) @@ -5351,9 +5351,9 @@ int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t c= r2_or_gpa, u64 error_code, void *insn, int insn_len) { int r, emulation_type =3D EMULTYPE_PF; - bool direct =3D vcpu->arch.mmu->direct_map; + bool direct =3D vcpu->arch.private->mmu->direct_map; =20 - if (WARN_ON(!VALID_PAGE(vcpu->arch.mmu->root_hpa))) + if (WARN_ON(!VALID_PAGE(vcpu->arch.private->mmu->root_hpa))) return RET_PF_RETRY; =20 r =3D RET_PF_INVALID; @@ -5382,14 +5382,14 @@ int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t= cr2_or_gpa, u64 error_code, * paging in both guests. If true, we simply unprotect the page * and resume the guest. */ - if (vcpu->arch.mmu->direct_map && + if (vcpu->arch.private->mmu->direct_map && (error_code & PFERR_NESTED_GUEST_PAGE) =3D=3D PFERR_NESTED_GUEST_PAGE= ) { kvm_mmu_unprotect_page(vcpu->kvm, gpa_to_gfn(cr2_or_gpa)); return 1; } =20 /* - * vcpu->arch.mmu.page_fault returned RET_PF_EMULATE, but we can still + * vcpu->arch.private->mmu.page_fault returned RET_PF_EMULATE, but we can= still * optimistically try to just unprotect the page and let the processor * re-execute the instruction that caused the page fault. Do not allow * retrying MMIO emulation, as it's not only pointless but could also @@ -5412,8 +5412,8 @@ void kvm_mmu_invalidate_gva(struct kvm_vcpu *vcpu, st= ruct kvm_mmu *mmu, { int i; =20 - /* It's actually a GPA for vcpu->arch.guest_mmu. */ - if (mmu !=3D &vcpu->arch.guest_mmu) { + /* It's actually a GPA for vcpu->arch.private->guest_mmu. */ + if (mmu !=3D &vcpu->arch.private->guest_mmu) { /* INVLPG on a non-canonical address is a NOP according to the SDM. */ if (is_noncanonical_address(gva, vcpu)) return; @@ -5448,7 +5448,7 @@ void kvm_mmu_invalidate_gva(struct kvm_vcpu *vcpu, st= ruct kvm_mmu *mmu, =20 void kvm_mmu_invlpg(struct kvm_vcpu *vcpu, gva_t gva) { - kvm_mmu_invalidate_gva(vcpu, vcpu->arch.walk_mmu, gva, INVALID_PAGE); + kvm_mmu_invalidate_gva(vcpu, vcpu->arch.private->walk_mmu, gva, INVALID_P= AGE); ++vcpu->stat.invlpg; } EXPORT_SYMBOL_GPL(kvm_mmu_invlpg); @@ -5456,7 +5456,7 @@ EXPORT_SYMBOL_GPL(kvm_mmu_invlpg); =20 void kvm_mmu_invpcid_gva(struct kvm_vcpu *vcpu, gva_t gva, unsigned long p= cid) { - struct kvm_mmu *mmu =3D vcpu->arch.mmu; + struct kvm_mmu *mmu =3D vcpu->arch.private->mmu; bool tlb_flush =3D false; uint i; =20 @@ -5638,24 +5638,24 @@ int kvm_mmu_create(struct kvm_vcpu *vcpu) vcpu->arch.mmu_shadow_page_cache.gfp_asi =3D 0; #endif =20 - vcpu->arch.mmu =3D &vcpu->arch.root_mmu; - vcpu->arch.walk_mmu =3D &vcpu->arch.root_mmu; + vcpu->arch.private->mmu =3D &vcpu->arch.private->root_mmu; + vcpu->arch.private->walk_mmu =3D &vcpu->arch.private->root_mmu; =20 - vcpu->arch.nested_mmu.translate_gpa =3D translate_nested_gpa; + vcpu->arch.private->nested_mmu.translate_gpa =3D translate_nested_gpa; =20 asi_init_pgtbl_pool(&vcpu->arch.asi_pgtbl_pool); =20 - ret =3D __kvm_mmu_create(vcpu, &vcpu->arch.guest_mmu); + ret =3D __kvm_mmu_create(vcpu, &vcpu->arch.private->guest_mmu); if (ret) return ret; =20 - ret =3D __kvm_mmu_create(vcpu, &vcpu->arch.root_mmu); + ret =3D __kvm_mmu_create(vcpu, &vcpu->arch.private->root_mmu); if (ret) goto fail_allocate_root; =20 return ret; fail_allocate_root: - free_mmu_pages(&vcpu->arch.guest_mmu); + free_mmu_pages(&vcpu->arch.private->guest_mmu); return ret; } =20 @@ -6261,8 +6261,8 @@ unsigned long kvm_mmu_calculate_default_mmu_pages(str= uct kvm *kvm) void kvm_mmu_destroy(struct kvm_vcpu *vcpu) { kvm_mmu_unload(vcpu); - free_mmu_pages(&vcpu->arch.root_mmu); - free_mmu_pages(&vcpu->arch.guest_mmu); + free_mmu_pages(&vcpu->arch.private->root_mmu); + free_mmu_pages(&vcpu->arch.private->guest_mmu); mmu_free_memory_caches(vcpu); asi_clear_pgtbl_pool(&vcpu->arch.asi_pgtbl_pool); } diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_interna= l.h index 52c6527b1a06..57ec9dd147da 100644 --- a/arch/x86/kvm/mmu/mmu_internal.h +++ b/arch/x86/kvm/mmu/mmu_internal.h @@ -114,7 +114,7 @@ static inline bool kvm_vcpu_ad_need_write_protect(struc= t kvm_vcpu *vcpu) * being enabled is mandatory as the bits used to denote WP-only SPTEs * are reserved for NPT w/ PAE (32-bit KVM). */ - return vcpu->arch.mmu =3D=3D &vcpu->arch.guest_mmu && + return vcpu->arch.private->mmu =3D=3D &vcpu->arch.private->guest_mmu && kvm_x86_ops.cpu_dirty_log_size; } =20 diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h index 193317ad60a4..c39a1a870a2b 100644 --- a/arch/x86/kvm/mmu/paging_tmpl.h +++ b/arch/x86/kvm/mmu/paging_tmpl.h @@ -194,11 +194,11 @@ static bool FNAME(prefetch_invalid_gpte)(struct kvm_v= cpu *vcpu, goto no_present; =20 /* if accessed bit is not supported prefetch non accessed gpte */ - if (PT_HAVE_ACCESSED_DIRTY(vcpu->arch.mmu) && + if (PT_HAVE_ACCESSED_DIRTY(vcpu->arch.private->mmu) && !(gpte & PT_GUEST_ACCESSED_MASK)) goto no_present; =20 - if (FNAME(is_rsvd_bits_set)(vcpu->arch.mmu, gpte, PG_LEVEL_4K)) + if (FNAME(is_rsvd_bits_set)(vcpu->arch.private->mmu, gpte, PG_LEVEL_4K)) goto no_present; =20 return false; @@ -533,7 +533,7 @@ static int FNAME(walk_addr_generic)(struct guest_walker= *walker, } #endif walker->fault.address =3D addr; - walker->fault.nested_page_fault =3D mmu !=3D vcpu->arch.walk_mmu; + walker->fault.nested_page_fault =3D mmu !=3D vcpu->arch.private->walk_mmu; walker->fault.async_page_fault =3D false; =20 trace_kvm_mmu_walker_error(walker->fault.error_code); @@ -543,7 +543,7 @@ static int FNAME(walk_addr_generic)(struct guest_walker= *walker, static int FNAME(walk_addr)(struct guest_walker *walker, struct kvm_vcpu *vcpu, gpa_t addr, u32 access) { - return FNAME(walk_addr_generic)(walker, vcpu, vcpu->arch.mmu, addr, + return FNAME(walk_addr_generic)(walker, vcpu, vcpu->arch.private->mmu, ad= dr, access); } =20 @@ -552,7 +552,7 @@ static int FNAME(walk_addr_nested)(struct guest_walker = *walker, struct kvm_vcpu *vcpu, gva_t addr, u32 access) { - return FNAME(walk_addr_generic)(walker, vcpu, &vcpu->arch.nested_mmu, + return FNAME(walk_addr_generic)(walker, vcpu, &vcpu->arch.private->nested= _mmu, addr, access); } #endif @@ -573,7 +573,7 @@ FNAME(prefetch_gpte)(struct kvm_vcpu *vcpu, struct kvm_= mmu_page *sp, =20 gfn =3D gpte_to_gfn(gpte); pte_access =3D sp->role.access & FNAME(gpte_access)(gpte); - FNAME(protect_clean_gpte)(vcpu->arch.mmu, &pte_access, gpte); + FNAME(protect_clean_gpte)(vcpu->arch.private->mmu, &pte_access, gpte); =20 slot =3D gfn_to_memslot_dirty_bitmap(vcpu, gfn, no_dirty_log && (pte_access & ACC_WRITE_MASK)); @@ -670,7 +670,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct k= vm_page_fault *fault, WARN_ON_ONCE(gw->gfn !=3D base_gfn); direct_access =3D gw->pte_access; =20 - top_level =3D vcpu->arch.mmu->root_level; + top_level =3D vcpu->arch.private->mmu->root_level; if (top_level =3D=3D PT32E_ROOT_LEVEL) top_level =3D PT32_ROOT_LEVEL; /* @@ -682,7 +682,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct k= vm_page_fault *fault, if (FNAME(gpte_changed)(vcpu, gw, top_level)) goto out_gpte_changed; =20 - if (WARN_ON(!VALID_PAGE(vcpu->arch.mmu->root_hpa))) + if (WARN_ON(!VALID_PAGE(vcpu->arch.private->mmu->root_hpa))) goto out_gpte_changed; =20 for (shadow_walk_init(&it, vcpu, fault->addr); @@ -806,7 +806,7 @@ FNAME(is_self_change_mapping)(struct kvm_vcpu *vcpu, bool self_changed =3D false; =20 if (!(walker->pte_access & ACC_WRITE_MASK || - (!is_cr0_wp(vcpu->arch.mmu) && !user_fault))) + (!is_cr0_wp(vcpu->arch.private->mmu) && !user_fault))) return false; =20 for (level =3D walker->level; level <=3D walker->max_level; level++) { @@ -905,7 +905,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, str= uct kvm_page_fault *fault * we will cache the incorrect access into mmio spte. */ if (fault->write && !(walker.pte_access & ACC_WRITE_MASK) && - !is_cr0_wp(vcpu->arch.mmu) && !fault->user && fault->slot) { + !is_cr0_wp(vcpu->arch.private->mmu) && !fault->user && fault->slot) { walker.pte_access |=3D ACC_WRITE_MASK; walker.pte_access &=3D ~ACC_USER_MASK; =20 @@ -915,7 +915,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, str= uct kvm_page_fault *fault * then we should prevent the kernel from executing it * if SMEP is enabled. */ - if (is_cr4_smep(vcpu->arch.mmu)) + if (is_cr4_smep(vcpu->arch.private->mmu)) walker.pte_access &=3D ~ACC_EXEC_MASK; } =20 @@ -1071,7 +1071,7 @@ static gpa_t FNAME(gva_to_gpa_nested)(struct kvm_vcpu= *vcpu, gpa_t vaddr, */ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp) { - union kvm_mmu_page_role mmu_role =3D vcpu->arch.mmu->mmu_role.base; + union kvm_mmu_page_role mmu_role =3D vcpu->arch.private->mmu->mmu_role.ba= se; int i; bool host_writable; gpa_t first_pte_gpa; @@ -1129,7 +1129,7 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, st= ruct kvm_mmu_page *sp) gfn =3D gpte_to_gfn(gpte); pte_access =3D sp->role.access; pte_access &=3D FNAME(gpte_access)(gpte); - FNAME(protect_clean_gpte)(vcpu->arch.mmu, &pte_access, gpte); + FNAME(protect_clean_gpte)(vcpu->arch.private->mmu, &pte_access, gpte); =20 if (sync_mmio_spte(vcpu, &sp->spt[i], gfn, pte_access)) continue; diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c index 13038fae5088..df14b6639b35 100644 --- a/arch/x86/kvm/mmu/spte.c +++ b/arch/x86/kvm/mmu/spte.c @@ -177,9 +177,9 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_pa= ge *sp, if (prefetch) spte =3D mark_spte_for_access_track(spte); =20 - WARN_ONCE(is_rsvd_spte(&vcpu->arch.mmu->shadow_zero_check, spte, level), + WARN_ONCE(is_rsvd_spte(&vcpu->arch.private->mmu->shadow_zero_check, spte,= level), "spte =3D 0x%llx, level =3D %d, rsvd bits =3D 0x%llx", spte, level, - get_rsvd_bits(&vcpu->arch.mmu->shadow_zero_check, spte, level)); + get_rsvd_bits(&vcpu->arch.private->mmu->shadow_zero_check, spte, level= )); =20 if ((spte & PT_WRITABLE_MASK) && kvm_slot_dirty_track_enabled(slot)) { /* Enforced by kvm_mmu_hugepage_adjust. */ diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 1beb4ca90560..c3634ac01869 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -162,7 +162,7 @@ static union kvm_mmu_page_role page_role_for_level(stru= ct kvm_vcpu *vcpu, { union kvm_mmu_page_role role; =20 - role =3D vcpu->arch.mmu->mmu_role.base; + role =3D vcpu->arch.private->mmu->mmu_role.base; role.level =3D level; role.direct =3D true; role.gpte_is_8_bytes =3D true; @@ -198,7 +198,7 @@ hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vc= pu) =20 lockdep_assert_held_write(&kvm->mmu_lock); =20 - role =3D page_role_for_level(vcpu, vcpu->arch.mmu->shadow_root_level); + role =3D page_role_for_level(vcpu, vcpu->arch.private->mmu->shadow_root_l= evel); =20 /* Check for an existing root before allocating a new one. */ for_each_tdp_mmu_root(kvm, root, kvm_mmu_role_as_id(role)) { @@ -207,7 +207,7 @@ hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vc= pu) goto out; } =20 - root =3D alloc_tdp_mmu_page(vcpu, 0, vcpu->arch.mmu->shadow_root_level); + root =3D alloc_tdp_mmu_page(vcpu, 0, vcpu->arch.private->mmu->shadow_root= _level); refcount_set(&root->tdp_mmu_root_count, 1); =20 spin_lock(&kvm->arch.tdp_mmu_pages_lock); @@ -952,7 +952,7 @@ static int tdp_mmu_map_handle_target_level(struct kvm_v= cpu *vcpu, */ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) { - struct kvm_mmu *mmu =3D vcpu->arch.mmu; + struct kvm_mmu *mmu =3D vcpu->arch.private->mmu; struct tdp_iter iter; struct kvm_mmu_page *sp; u64 *child_pt; @@ -1486,11 +1486,11 @@ int kvm_tdp_mmu_get_walk(struct kvm_vcpu *vcpu, u64= addr, u64 *sptes, int *root_level) { struct tdp_iter iter; - struct kvm_mmu *mmu =3D vcpu->arch.mmu; + struct kvm_mmu *mmu =3D vcpu->arch.private->mmu; gfn_t gfn =3D addr >> PAGE_SHIFT; int leaf =3D -1; =20 - *root_level =3D vcpu->arch.mmu->shadow_root_level; + *root_level =3D vcpu->arch.private->mmu->shadow_root_level; =20 tdp_mmu_for_each_pte(iter, mmu, gfn, gfn + 1) { leaf =3D iter.level; @@ -1515,7 +1515,7 @@ u64 *kvm_tdp_mmu_fast_pf_get_last_sptep(struct kvm_vc= pu *vcpu, u64 addr, u64 *spte) { struct tdp_iter iter; - struct kvm_mmu *mmu =3D vcpu->arch.mmu; + struct kvm_mmu *mmu =3D vcpu->arch.private->mmu; gfn_t gfn =3D addr >> PAGE_SHIFT; tdp_ptep_t sptep =3D NULL; =20 diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c index f8b7bc04b3e7..c90ef5bf26cf 100644 --- a/arch/x86/kvm/svm/nested.c +++ b/arch/x86/kvm/svm/nested.c @@ -97,7 +97,7 @@ static void nested_svm_init_mmu_context(struct kvm_vcpu *= vcpu) =20 WARN_ON(mmu_is_nested(vcpu)); =20 - vcpu->arch.mmu =3D &vcpu->arch.guest_mmu; + vcpu->arch.private->mmu =3D &vcpu->arch.private->guest_mmu; =20 /* * The NPT format depends on L1's CR4 and EFER, which is in vmcb01. Note, @@ -107,16 +107,16 @@ static void nested_svm_init_mmu_context(struct kvm_vc= pu *vcpu) kvm_init_shadow_npt_mmu(vcpu, X86_CR0_PG, svm->vmcb01.ptr->save.cr4, svm->vmcb01.ptr->save.efer, svm->nested.ctl.nested_cr3); - vcpu->arch.mmu->get_guest_pgd =3D nested_svm_get_tdp_cr3; - vcpu->arch.mmu->get_pdptr =3D nested_svm_get_tdp_pdptr; - vcpu->arch.mmu->inject_page_fault =3D nested_svm_inject_npf_exit; - vcpu->arch.walk_mmu =3D &vcpu->arch.nested_mmu; + vcpu->arch.private->mmu->get_guest_pgd =3D nested_svm_get_tdp_cr3; + vcpu->arch.private->mmu->get_pdptr =3D nested_svm_get_tdp_pdptr; + vcpu->arch.private->mmu->inject_page_fault =3D nested_svm_inject_npf_exit; + vcpu->arch.private->walk_mmu =3D &vcpu->arch.private->neste= d_mmu; } =20 static void nested_svm_uninit_mmu_context(struct kvm_vcpu *vcpu) { - vcpu->arch.mmu =3D &vcpu->arch.root_mmu; - vcpu->arch.walk_mmu =3D &vcpu->arch.root_mmu; + vcpu->arch.private->mmu =3D &vcpu->arch.private->root_mmu; + vcpu->arch.private->walk_mmu =3D &vcpu->arch.private->root_mmu; } =20 void recalc_intercepts(struct vcpu_svm *svm) @@ -437,13 +437,13 @@ static int nested_svm_load_cr3(struct kvm_vcpu *vcpu,= unsigned long cr3, return -EINVAL; =20 if (reload_pdptrs && !nested_npt && is_pae_paging(vcpu) && - CC(!load_pdptrs(vcpu, vcpu->arch.walk_mmu, cr3))) + CC(!load_pdptrs(vcpu, vcpu->arch.private->walk_mmu, cr3))) return -EINVAL; =20 if (!nested_npt) kvm_mmu_new_pgd(vcpu, cr3); =20 - vcpu->arch.cr3 =3D cr3; + vcpu->arch.private->cr3 =3D cr3; kvm_register_mark_available(vcpu, VCPU_EXREG_CR3); =20 /* Re-initialize the MMU, e.g. to pick up CR4 MMU role changes. */ @@ -500,7 +500,7 @@ static void nested_vmcb02_prepare_save(struct vcpu_svm = *svm, struct vmcb *vmcb12 svm_set_cr0(&svm->vcpu, vmcb12->save.cr0); svm_set_cr4(&svm->vcpu, vmcb12->save.cr4); =20 - svm->vcpu.arch.cr2 =3D vmcb12->save.cr2; + svm->vcpu.arch.private->cr2 =3D vmcb12->save.cr2; =20 kvm_rax_write(&svm->vcpu, vmcb12->save.rax); kvm_rsp_write(&svm->vcpu, vmcb12->save.rsp); @@ -634,7 +634,7 @@ int enter_svm_guest_mode(struct kvm_vcpu *vcpu, u64 vmc= b12_gpa, return ret; =20 if (!npt_enabled) - vcpu->arch.mmu->inject_page_fault =3D svm_inject_page_fault_nested; + vcpu->arch.private->mmu->inject_page_fault =3D svm_inject_page_fault_nes= ted; =20 if (!from_vmrun) kvm_make_request(KVM_REQ_GET_NESTED_STATE_PAGES, vcpu); @@ -695,7 +695,7 @@ int nested_svm_vmrun(struct kvm_vcpu *vcpu) */ svm->vmcb01.ptr->save.efer =3D vcpu->arch.efer; svm->vmcb01.ptr->save.cr0 =3D kvm_read_cr0(vcpu); - svm->vmcb01.ptr->save.cr4 =3D vcpu->arch.cr4; + svm->vmcb01.ptr->save.cr4 =3D vcpu->arch.private->cr4; svm->vmcb01.ptr->save.rflags =3D kvm_get_rflags(vcpu); svm->vmcb01.ptr->save.rip =3D kvm_rip_read(vcpu); =20 @@ -805,7 +805,7 @@ int nested_svm_vmexit(struct vcpu_svm *svm) vmcb12->save.cr0 =3D kvm_read_cr0(vcpu); vmcb12->save.cr3 =3D kvm_read_cr3(vcpu); vmcb12->save.cr2 =3D vmcb->save.cr2; - vmcb12->save.cr4 =3D svm->vcpu.arch.cr4; + vmcb12->save.cr4 =3D svm->vcpu.arch.private->cr4; vmcb12->save.rflags =3D kvm_get_rflags(vcpu); vmcb12->save.rip =3D kvm_rip_read(vcpu); vmcb12->save.rsp =3D kvm_rsp_read(vcpu); @@ -991,7 +991,7 @@ static int nested_svm_exit_handled_msr(struct vcpu_svm = *svm) if (!(vmcb_is_intercept(&svm->nested.ctl, INTERCEPT_MSR_PROT))) return NESTED_EXIT_HOST; =20 - msr =3D svm->vcpu.arch.regs[VCPU_REGS_RCX]; + msr =3D svm->vcpu.arch.private->regs[VCPU_REGS_RCX]; offset =3D svm_msrpm_offset(msr); write =3D svm->vmcb->control.exit_info_1 & 1; mask =3D 1 << ((2 * (msr & 0xf)) + write); @@ -1131,7 +1131,7 @@ static void nested_svm_inject_exception_vmexit(struct= vcpu_svm *svm) else if (svm->vcpu.arch.exception.has_payload) svm->vmcb->control.exit_info_2 =3D svm->vcpu.arch.exception.payload; else - svm->vmcb->control.exit_info_2 =3D svm->vcpu.arch.cr2; + svm->vmcb->control.exit_info_2 =3D svm->vcpu.arch.private->cr2; } else if (nr =3D=3D DB_VECTOR) { /* See inject_pending_event. */ kvm_deliver_exception_payload(&svm->vcpu); @@ -1396,7 +1396,7 @@ static int svm_set_nested_state(struct kvm_vcpu *vcpu, * Set it again to fix this. */ =20 - ret =3D nested_svm_load_cr3(&svm->vcpu, vcpu->arch.cr3, + ret =3D nested_svm_load_cr3(&svm->vcpu, vcpu->arch.private->cr3, nested_npt_enabled(svm), false); if (WARN_ON_ONCE(ret)) goto out_free; @@ -1449,7 +1449,7 @@ static bool svm_get_nested_state_pages(struct kvm_vcp= u *vcpu) * the guest CR3 might be restored prior to setting the nested * state which can lead to a load of wrong PDPTRs. */ - if (CC(!load_pdptrs(vcpu, vcpu->arch.walk_mmu, vcpu->arch.cr3))) + if (CC(!load_pdptrs(vcpu, vcpu->arch.private->walk_mmu, vcpu->arch.priva= te->cr3))) return false; =20 if (!nested_svm_vmrun_msrpm(svm)) { diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c index be2883141220..9c62566ddde8 100644 --- a/arch/x86/kvm/svm/sev.c +++ b/arch/x86/kvm/svm/sev.c @@ -565,28 +565,28 @@ static int sev_es_sync_vmsa(struct vcpu_svm *svm) return -EINVAL; =20 /* Sync registgers */ - save->rax =3D svm->vcpu.arch.regs[VCPU_REGS_RAX]; - save->rbx =3D svm->vcpu.arch.regs[VCPU_REGS_RBX]; - save->rcx =3D svm->vcpu.arch.regs[VCPU_REGS_RCX]; - save->rdx =3D svm->vcpu.arch.regs[VCPU_REGS_RDX]; - save->rsp =3D svm->vcpu.arch.regs[VCPU_REGS_RSP]; - save->rbp =3D svm->vcpu.arch.regs[VCPU_REGS_RBP]; - save->rsi =3D svm->vcpu.arch.regs[VCPU_REGS_RSI]; - save->rdi =3D svm->vcpu.arch.regs[VCPU_REGS_RDI]; + save->rax =3D svm->vcpu.arch.private->regs[VCPU_REGS_RAX]; + save->rbx =3D svm->vcpu.arch.private->regs[VCPU_REGS_RBX]; + save->rcx =3D svm->vcpu.arch.private->regs[VCPU_REGS_RCX]; + save->rdx =3D svm->vcpu.arch.private->regs[VCPU_REGS_RDX]; + save->rsp =3D svm->vcpu.arch.private->regs[VCPU_REGS_RSP]; + save->rbp =3D svm->vcpu.arch.private->regs[VCPU_REGS_RBP]; + save->rsi =3D svm->vcpu.arch.private->regs[VCPU_REGS_RSI]; + save->rdi =3D svm->vcpu.arch.private->regs[VCPU_REGS_RDI]; #ifdef CONFIG_X86_64 - save->r8 =3D svm->vcpu.arch.regs[VCPU_REGS_R8]; - save->r9 =3D svm->vcpu.arch.regs[VCPU_REGS_R9]; - save->r10 =3D svm->vcpu.arch.regs[VCPU_REGS_R10]; - save->r11 =3D svm->vcpu.arch.regs[VCPU_REGS_R11]; - save->r12 =3D svm->vcpu.arch.regs[VCPU_REGS_R12]; - save->r13 =3D svm->vcpu.arch.regs[VCPU_REGS_R13]; - save->r14 =3D svm->vcpu.arch.regs[VCPU_REGS_R14]; - save->r15 =3D svm->vcpu.arch.regs[VCPU_REGS_R15]; + save->r8 =3D svm->vcpu.arch.private->regs[VCPU_REGS_R8]; + save->r9 =3D svm->vcpu.arch.private->regs[VCPU_REGS_R9]; + save->r10 =3D svm->vcpu.arch.private->regs[VCPU_REGS_R10]; + save->r11 =3D svm->vcpu.arch.private->regs[VCPU_REGS_R11]; + save->r12 =3D svm->vcpu.arch.private->regs[VCPU_REGS_R12]; + save->r13 =3D svm->vcpu.arch.private->regs[VCPU_REGS_R13]; + save->r14 =3D svm->vcpu.arch.private->regs[VCPU_REGS_R14]; + save->r15 =3D svm->vcpu.arch.private->regs[VCPU_REGS_R15]; #endif - save->rip =3D svm->vcpu.arch.regs[VCPU_REGS_RIP]; + save->rip =3D svm->vcpu.arch.private->regs[VCPU_REGS_RIP]; =20 /* Sync some non-GPR registers before encrypting */ - save->xcr0 =3D svm->vcpu.arch.xcr0; + save->xcr0 =3D svm->vcpu.arch.private->xcr0; save->pkru =3D svm->vcpu.arch.pkru; save->xss =3D svm->vcpu.arch.ia32_xss; save->dr6 =3D svm->vcpu.arch.dr6; @@ -2301,10 +2301,10 @@ static void sev_es_sync_to_ghcb(struct vcpu_svm *sv= m) * Copy their values, even if they may not have been written during the * VM-Exit. It's the guest's responsibility to not consume random data. */ - ghcb_set_rax(ghcb, vcpu->arch.regs[VCPU_REGS_RAX]); - ghcb_set_rbx(ghcb, vcpu->arch.regs[VCPU_REGS_RBX]); - ghcb_set_rcx(ghcb, vcpu->arch.regs[VCPU_REGS_RCX]); - ghcb_set_rdx(ghcb, vcpu->arch.regs[VCPU_REGS_RDX]); + ghcb_set_rax(ghcb, vcpu->arch.private->regs[VCPU_REGS_RAX]); + ghcb_set_rbx(ghcb, vcpu->arch.private->regs[VCPU_REGS_RBX]); + ghcb_set_rcx(ghcb, vcpu->arch.private->regs[VCPU_REGS_RCX]); + ghcb_set_rdx(ghcb, vcpu->arch.private->regs[VCPU_REGS_RDX]); } =20 static void sev_es_sync_from_ghcb(struct vcpu_svm *svm) @@ -2326,18 +2326,18 @@ static void sev_es_sync_from_ghcb(struct vcpu_svm *= svm) * * Copy their values to the appropriate location if supplied. */ - memset(vcpu->arch.regs, 0, sizeof(vcpu->arch.regs)); + memset(vcpu->arch.private->regs, 0, sizeof(vcpu->arch.private->regs)); =20 - vcpu->arch.regs[VCPU_REGS_RAX] =3D ghcb_get_rax_if_valid(ghcb); - vcpu->arch.regs[VCPU_REGS_RBX] =3D ghcb_get_rbx_if_valid(ghcb); - vcpu->arch.regs[VCPU_REGS_RCX] =3D ghcb_get_rcx_if_valid(ghcb); - vcpu->arch.regs[VCPU_REGS_RDX] =3D ghcb_get_rdx_if_valid(ghcb); - vcpu->arch.regs[VCPU_REGS_RSI] =3D ghcb_get_rsi_if_valid(ghcb); + vcpu->arch.private->regs[VCPU_REGS_RAX] =3D ghcb_get_rax_if_valid(ghcb); + vcpu->arch.private->regs[VCPU_REGS_RBX] =3D ghcb_get_rbx_if_valid(ghcb); + vcpu->arch.private->regs[VCPU_REGS_RCX] =3D ghcb_get_rcx_if_valid(ghcb); + vcpu->arch.private->regs[VCPU_REGS_RDX] =3D ghcb_get_rdx_if_valid(ghcb); + vcpu->arch.private->regs[VCPU_REGS_RSI] =3D ghcb_get_rsi_if_valid(ghcb); =20 svm->vmcb->save.cpl =3D ghcb_get_cpl_if_valid(ghcb); =20 if (ghcb_xcr0_is_valid(ghcb)) { - vcpu->arch.xcr0 =3D ghcb_get_xcr0(ghcb); + vcpu->arch.private->xcr0 =3D ghcb_get_xcr0(ghcb); kvm_update_cpuid_runtime(vcpu); } =20 @@ -2667,8 +2667,8 @@ static int sev_handle_vmgexit_msr_protocol(struct vcp= u_svm *svm) GHCB_MSR_CPUID_FUNC_POS); =20 /* Initialize the registers needed by the CPUID intercept */ - vcpu->arch.regs[VCPU_REGS_RAX] =3D cpuid_fn; - vcpu->arch.regs[VCPU_REGS_RCX] =3D 0; + vcpu->arch.private->regs[VCPU_REGS_RAX] =3D cpuid_fn; + vcpu->arch.private->regs[VCPU_REGS_RCX] =3D 0; =20 ret =3D svm_invoke_exit_handler(vcpu, SVM_EXIT_CPUID); if (!ret) { @@ -2680,13 +2680,13 @@ static int sev_handle_vmgexit_msr_protocol(struct v= cpu_svm *svm) GHCB_MSR_CPUID_REG_MASK, GHCB_MSR_CPUID_REG_POS); if (cpuid_reg =3D=3D 0) - cpuid_value =3D vcpu->arch.regs[VCPU_REGS_RAX]; + cpuid_value =3D vcpu->arch.private->regs[VCPU_REGS_RAX]; else if (cpuid_reg =3D=3D 1) - cpuid_value =3D vcpu->arch.regs[VCPU_REGS_RBX]; + cpuid_value =3D vcpu->arch.private->regs[VCPU_REGS_RBX]; else if (cpuid_reg =3D=3D 2) - cpuid_value =3D vcpu->arch.regs[VCPU_REGS_RCX]; + cpuid_value =3D vcpu->arch.private->regs[VCPU_REGS_RCX]; else - cpuid_value =3D vcpu->arch.regs[VCPU_REGS_RDX]; + cpuid_value =3D vcpu->arch.private->regs[VCPU_REGS_RDX]; =20 set_ghcb_msr_bits(svm, cpuid_value, GHCB_MSR_CPUID_VALUE_MASK, diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index 5151efa424ac..516af87e7ab1 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -1425,10 +1425,10 @@ static int svm_create_vcpu(struct kvm_vcpu *vcpu) /* * SEV-ES guests maintain an encrypted version of their FPU * state which is restored and saved on VMRUN and VMEXIT. - * Mark vcpu->arch.guest_fpu->fpstate as scratch so it won't + * Mark vcpu->arch.private->guest_fpu->fpstate as scratch so it won't * do xsave/xrstor on it. */ - fpstate_set_confidential(&vcpu->arch.guest_fpu); + fpstate_set_confidential(&vcpu->arch.private->guest_fpu); } =20 err =3D avic_init_vcpu(svm); @@ -1599,7 +1599,7 @@ static void svm_cache_reg(struct kvm_vcpu *vcpu, enum= kvm_reg reg) switch (reg) { case VCPU_EXREG_PDPTR: BUG_ON(!npt_enabled); - load_pdptrs(vcpu, vcpu->arch.walk_mmu, kvm_read_cr3(vcpu)); + load_pdptrs(vcpu, vcpu->arch.private->walk_mmu, kvm_read_cr3(vcpu)); break; default: KVM_BUG_ON(1, vcpu->kvm); @@ -1804,7 +1804,7 @@ void svm_set_cr0(struct kvm_vcpu *vcpu, unsigned long= cr0) } } #endif - vcpu->arch.cr0 =3D cr0; + vcpu->arch.private->cr0 =3D cr0; =20 if (!npt_enabled) hcr0 |=3D X86_CR0_PG | X86_CR0_WP; @@ -1845,12 +1845,12 @@ static bool svm_is_valid_cr4(struct kvm_vcpu *vcpu,= unsigned long cr4) void svm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4) { unsigned long host_cr4_mce =3D cr4_read_shadow() & X86_CR4_MCE; - unsigned long old_cr4 =3D vcpu->arch.cr4; + unsigned long old_cr4 =3D vcpu->arch.private->cr4; =20 if (npt_enabled && ((old_cr4 ^ cr4) & X86_CR4_PGE)) svm_flush_tlb(vcpu); =20 - vcpu->arch.cr4 =3D cr4; + vcpu->arch.private->cr4 =3D cr4; if (!npt_enabled) cr4 |=3D X86_CR4_PAE; cr4 |=3D host_cr4_mce; @@ -2239,7 +2239,7 @@ enum { /* Return NONE_SVM_INSTR if not SVM instrs, otherwise return decode result= */ static int svm_instr_opcode(struct kvm_vcpu *vcpu) { - struct x86_emulate_ctxt *ctxt =3D vcpu->arch.emulate_ctxt; + struct x86_emulate_ctxt *ctxt =3D vcpu->arch.private->emulate_ctxt; =20 if (ctxt->b !=3D 0x1 || ctxt->opcode_len !=3D 2) return NONE_SVM_INSTR; @@ -2513,7 +2513,7 @@ static bool check_selective_cr0_intercepted(struct kv= m_vcpu *vcpu, unsigned long val) { struct vcpu_svm *svm =3D to_svm(vcpu); - unsigned long cr0 =3D vcpu->arch.cr0; + unsigned long cr0 =3D vcpu->arch.private->cr0; bool ret =3D false; =20 if (!is_guest_mode(vcpu) || @@ -2585,7 +2585,7 @@ static int cr_interception(struct kvm_vcpu *vcpu) val =3D kvm_read_cr0(vcpu); break; case 2: - val =3D vcpu->arch.cr2; + val =3D vcpu->arch.private->cr2; break; case 3: val =3D kvm_read_cr3(vcpu); @@ -3396,9 +3396,9 @@ static int handle_exit(struct kvm_vcpu *vcpu, fastpat= h_t exit_fastpath) /* SEV-ES guests must use the CR write traps to track CR registers. */ if (!sev_es_guest(vcpu->kvm)) { if (!svm_is_intercept(svm, INTERCEPT_CR0_WRITE)) - vcpu->arch.cr0 =3D svm->vmcb->save.cr0; + vcpu->arch.private->cr0 =3D svm->vmcb->save.cr0; if (npt_enabled) - vcpu->arch.cr3 =3D svm->vmcb->save.cr3; + vcpu->arch.private->cr3 =3D svm->vmcb->save.cr3; } =20 if (is_guest_mode(vcpu)) { @@ -3828,7 +3828,7 @@ static noinstr void svm_vcpu_enter_exit(struct kvm_vc= pu *vcpu) * vmcb02 when switching vmcbs for nested virtualization. */ vmload(svm->vmcb01.pa); - __svm_vcpu_run(vmcb_pa, (unsigned long *)&vcpu->arch.regs); + __svm_vcpu_run(vmcb_pa, (unsigned long *)&vcpu->arch.private->regs); vmsave(svm->vmcb01.pa); =20 vmload(__sme_page_pa(sd->save_area)); @@ -3843,9 +3843,9 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_= vcpu *vcpu) =20 trace_kvm_entry(vcpu); =20 - svm->vmcb->save.rax =3D vcpu->arch.regs[VCPU_REGS_RAX]; - svm->vmcb->save.rsp =3D vcpu->arch.regs[VCPU_REGS_RSP]; - svm->vmcb->save.rip =3D vcpu->arch.regs[VCPU_REGS_RIP]; + svm->vmcb->save.rax =3D vcpu->arch.private->regs[VCPU_REGS_RAX]; + svm->vmcb->save.rsp =3D vcpu->arch.private->regs[VCPU_REGS_RSP]; + svm->vmcb->save.rip =3D vcpu->arch.private->regs[VCPU_REGS_RIP]; =20 /* * Disable singlestep if we're injecting an interrupt/exception. @@ -3871,7 +3871,7 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_= vcpu *vcpu) svm->vmcb->control.asid =3D svm->asid; vmcb_mark_dirty(svm->vmcb, VMCB_ASID); } - svm->vmcb->save.cr2 =3D vcpu->arch.cr2; + svm->vmcb->save.cr2 =3D vcpu->arch.private->cr2; =20 svm_hv_update_vp_id(svm->vmcb, vcpu); =20 @@ -3926,10 +3926,10 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kv= m_vcpu *vcpu) x86_spec_ctrl_restore_host(svm->spec_ctrl, svm->virt_spec_ctrl); =20 if (!sev_es_guest(vcpu->kvm)) { - vcpu->arch.cr2 =3D svm->vmcb->save.cr2; - vcpu->arch.regs[VCPU_REGS_RAX] =3D svm->vmcb->save.rax; - vcpu->arch.regs[VCPU_REGS_RSP] =3D svm->vmcb->save.rsp; - vcpu->arch.regs[VCPU_REGS_RIP] =3D svm->vmcb->save.rip; + vcpu->arch.private->cr2 =3D svm->vmcb->save.cr2; + vcpu->arch.private->regs[VCPU_REGS_RAX] =3D svm->vmcb->save.rax; + vcpu->arch.private->regs[VCPU_REGS_RSP] =3D svm->vmcb->save.rsp; + vcpu->arch.private->regs[VCPU_REGS_RIP] =3D svm->vmcb->save.rip; } =20 if (unlikely(svm->vmcb->control.exit_code =3D=3D SVM_EXIT_NMI)) @@ -3999,8 +3999,8 @@ static void svm_load_mmu_pgd(struct kvm_vcpu *vcpu, h= pa_t root_hpa, /* Loading L2's CR3 is handled by enter_svm_guest_mode. */ if (!test_bit(VCPU_EXREG_CR3, (ulong *)&vcpu->arch.regs_avail)) return; - cr3 =3D vcpu->arch.cr3; - } else if (vcpu->arch.mmu->shadow_root_level >=3D PT64_ROOT_4LEVEL) { + cr3 =3D vcpu->arch.private->cr3; + } else if (vcpu->arch.private->mmu->shadow_root_level >=3D PT64_ROOT_4LEV= EL) { cr3 =3D __sme_set(root_hpa) | kvm_get_active_pcid(vcpu); } else { /* PCID in the guest should be impossible with a 32-bit MMU. */ @@ -4221,7 +4221,7 @@ static int svm_check_intercept(struct kvm_vcpu *vcpu, INTERCEPT_SELECTIVE_CR0))) break; =20 - cr0 =3D vcpu->arch.cr0 & ~SVM_CR0_SELECTIVE_MASK; + cr0 =3D vcpu->arch.private->cr0 & ~SVM_CR0_SELECTIVE_MASK; val =3D info->src_val & ~SVM_CR0_SELECTIVE_MASK; =20 if (info->intercept =3D=3D x86_intercept_lmsw) { @@ -4358,9 +4358,9 @@ static int svm_enter_smm(struct kvm_vcpu *vcpu, char = *smstate) /* FEE0h - SVM Guest VMCB Physical Address */ put_smstate(u64, smstate, 0x7ee0, svm->nested.vmcb12_gpa); =20 - svm->vmcb->save.rax =3D vcpu->arch.regs[VCPU_REGS_RAX]; - svm->vmcb->save.rsp =3D vcpu->arch.regs[VCPU_REGS_RSP]; - svm->vmcb->save.rip =3D vcpu->arch.regs[VCPU_REGS_RIP]; + svm->vmcb->save.rax =3D vcpu->arch.private->regs[VCPU_REGS_RAX]; + svm->vmcb->save.rsp =3D vcpu->arch.private->regs[VCPU_REGS_RSP]; + svm->vmcb->save.rip =3D vcpu->arch.private->regs[VCPU_REGS_RIP]; =20 ret =3D nested_svm_vmexit(svm); if (ret) diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h index 953b0fcb21ee..2dc906dc9c13 100644 --- a/arch/x86/kvm/trace.h +++ b/arch/x86/kvm/trace.h @@ -791,13 +791,13 @@ TRACE_EVENT(kvm_emulate_insn, =20 TP_fast_assign( __entry->csbase =3D static_call(kvm_x86_get_segment_base)(vcpu, VCPU_SRE= G_CS); - __entry->len =3D vcpu->arch.emulate_ctxt->fetch.ptr - - vcpu->arch.emulate_ctxt->fetch.data; - __entry->rip =3D vcpu->arch.emulate_ctxt->_eip - __entry->len; + __entry->len =3D vcpu->arch.private->emulate_ctxt->fetch.ptr + - vcpu->arch.private->emulate_ctxt->fetch.data; + __entry->rip =3D vcpu->arch.private->emulate_ctxt->_eip - __entry->len; memcpy(__entry->insn, - vcpu->arch.emulate_ctxt->fetch.data, + vcpu->arch.private->emulate_ctxt->fetch.data, 15); - __entry->flags =3D kei_decode_mode(vcpu->arch.emulate_ctxt->mode); + __entry->flags =3D kei_decode_mode(vcpu->arch.private->emulate_ctxt->mod= e); __entry->failed =3D failed; ), =20 diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c index 0a0092e4102d..34b7621adf99 100644 --- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -313,7 +313,7 @@ static void free_nested(struct kvm_vcpu *vcpu) kvm_vcpu_unmap(vcpu, &vmx->nested.pi_desc_map, true); vmx->nested.pi_desc =3D NULL; =20 - kvm_mmu_free_roots(vcpu, &vcpu->arch.guest_mmu, KVM_MMU_ROOTS_ALL); + kvm_mmu_free_roots(vcpu, &vcpu->arch.private->guest_mmu, KVM_MMU_ROOTS_AL= L); =20 nested_release_evmcs(vcpu); =20 @@ -356,11 +356,11 @@ static void nested_ept_invalidate_addr(struct kvm_vcp= u *vcpu, gpa_t eptp, WARN_ON_ONCE(!mmu_is_nested(vcpu)); =20 for (i =3D 0; i < KVM_MMU_NUM_PREV_ROOTS; i++) { - cached_root =3D &vcpu->arch.mmu->prev_roots[i]; + cached_root =3D &vcpu->arch.private->mmu->prev_roots[i]; =20 if (nested_ept_root_matches(cached_root->hpa, cached_root->pgd, eptp)) - vcpu->arch.mmu->invlpg(vcpu, addr, cached_root->hpa); + vcpu->arch.private->mmu->invlpg(vcpu, addr, cached_root->hpa); } } =20 @@ -410,19 +410,19 @@ static void nested_ept_init_mmu_context(struct kvm_vc= pu *vcpu) { WARN_ON(mmu_is_nested(vcpu)); =20 - vcpu->arch.mmu =3D &vcpu->arch.guest_mmu; + vcpu->arch.private->mmu =3D &vcpu->arch.private->guest_mmu; nested_ept_new_eptp(vcpu); - vcpu->arch.mmu->get_guest_pgd =3D nested_ept_get_eptp; - vcpu->arch.mmu->inject_page_fault =3D nested_ept_inject_page_fault; - vcpu->arch.mmu->get_pdptr =3D kvm_pdptr_read; + vcpu->arch.private->mmu->get_guest_pgd =3D nested_ept_get_eptp; + vcpu->arch.private->mmu->inject_page_fault =3D nested_ept_inject_page_fau= lt; + vcpu->arch.private->mmu->get_pdptr =3D kvm_pdptr_read; =20 - vcpu->arch.walk_mmu =3D &vcpu->arch.nested_mmu; + vcpu->arch.private->walk_mmu =3D &vcpu->arch.private->nested= _mmu; } =20 static void nested_ept_uninit_mmu_context(struct kvm_vcpu *vcpu) { - vcpu->arch.mmu =3D &vcpu->arch.root_mmu; - vcpu->arch.walk_mmu =3D &vcpu->arch.root_mmu; + vcpu->arch.private->mmu =3D &vcpu->arch.private->root_mmu; + vcpu->arch.private->walk_mmu =3D &vcpu->arch.private->root_mmu; } =20 static bool nested_vmx_is_page_fault_vmexit(struct vmcs12 *vmcs12, @@ -456,7 +456,7 @@ static int nested_vmx_check_exception(struct kvm_vcpu *= vcpu, unsigned long *exit } if (nested_vmx_is_page_fault_vmexit(vmcs12, vcpu->arch.exception.error_code)) { - *exit_qual =3D has_payload ? payload : vcpu->arch.cr2; + *exit_qual =3D has_payload ? payload : vcpu->arch.private->cr2; return 1; } } else if (vmcs12->exception_bitmap & (1u << nr)) { @@ -1103,7 +1103,7 @@ static int nested_vmx_load_cr3(struct kvm_vcpu *vcpu,= unsigned long cr3, * must not be dereferenced. */ if (reload_pdptrs && !nested_ept && is_pae_paging(vcpu) && - CC(!load_pdptrs(vcpu, vcpu->arch.walk_mmu, cr3))) { + CC(!load_pdptrs(vcpu, vcpu->arch.private->walk_mmu, cr3))) { *entry_failure_code =3D ENTRY_FAIL_PDPTE; return -EINVAL; } @@ -1111,7 +1111,7 @@ static int nested_vmx_load_cr3(struct kvm_vcpu *vcpu,= unsigned long cr3, if (!nested_ept) kvm_mmu_new_pgd(vcpu, cr3); =20 - vcpu->arch.cr3 =3D cr3; + vcpu->arch.private->cr3 =3D cr3; kvm_register_mark_available(vcpu, VCPU_EXREG_CR3); =20 /* Re-initialize the MMU, e.g. to pick up CR4 MMU role changes. */ @@ -2508,8 +2508,8 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, stru= ct vmcs12 *vmcs12, * trap. Note that CR0.TS also needs updating - we do this later. */ vmx_update_exception_bitmap(vcpu); - vcpu->arch.cr0_guest_owned_bits &=3D ~vmcs12->cr0_guest_host_mask; - vmcs_writel(CR0_GUEST_HOST_MASK, ~vcpu->arch.cr0_guest_owned_bits); + vcpu->arch.private->cr0_guest_owned_bits &=3D ~vmcs12->cr0_guest_host_mas= k; + vmcs_writel(CR0_GUEST_HOST_MASK, ~vcpu->arch.private->cr0_guest_owned_bit= s); =20 if (vmx->nested.nested_run_pending && (vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_PAT)) { @@ -2595,7 +2595,7 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, stru= ct vmcs12 *vmcs12, } =20 if (!enable_ept) - vcpu->arch.walk_mmu->inject_page_fault =3D vmx_inject_page_fault_nested; + vcpu->arch.private->walk_mmu->inject_page_fault =3D vmx_inject_page_faul= t_nested; =20 if ((vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL) && WARN_ON_ONCE(kvm_set_msr(vcpu, MSR_CORE_PERF_GLOBAL_CTRL, @@ -3070,7 +3070,7 @@ static int nested_vmx_check_vmentry_hw(struct kvm_vcp= u *vcpu) vmx->loaded_vmcs->host_state.cr4 =3D cr4; } =20 - vm_fail =3D __vmx_vcpu_run(vmx, (unsigned long *)&vcpu->arch.regs, + vm_fail =3D __vmx_vcpu_run(vmx, (unsigned long *)&vcpu->arch.private->reg= s, vmx->loaded_vmcs->launched); =20 if (vmx->msr_autoload.host.nr) @@ -3153,7 +3153,7 @@ static bool nested_get_vmcs12_pages(struct kvm_vcpu *= vcpu) * the guest CR3 might be restored prior to setting the nested * state which can lead to a load of wrong PDPTRs. */ - if (CC(!load_pdptrs(vcpu, vcpu->arch.walk_mmu, vcpu->arch.cr3))) + if (CC(!load_pdptrs(vcpu, vcpu->arch.private->walk_mmu, vcpu->arch.priva= te->cr3))) return false; } =20 @@ -3370,18 +3370,18 @@ enum nvmx_vmentry_status nested_vmx_enter_non_root_= mode(struct kvm_vcpu *vcpu, * i.e. a VM-Fail detected by hardware but not KVM, KVM must unwind its * software model to the pre-VMEntry host state. When EPT is disabled, * GUEST_CR3 holds KVM's shadow CR3, not L1's "real" CR3, which causes - * nested_vmx_restore_host_state() to corrupt vcpu->arch.cr3. Stuffing - * vmcs01.GUEST_CR3 results in the unwind naturally setting arch.cr3 to + * nested_vmx_restore_host_state() to corrupt vcpu->arch.private->cr3. S= tuffing + * vmcs01.GUEST_CR3 results in the unwind naturally setting arch.private-= >cr3 to * the correct value. Smashing vmcs01.GUEST_CR3 is safe because nested * VM-Exits, and the unwind, reset KVM's MMU, i.e. vmcs01.GUEST_CR3 is * guaranteed to be overwritten with a shadow CR3 prior to re-entering * L1. Don't stuff vmcs01.GUEST_CR3 when using nested early checks as - * KVM modifies vcpu->arch.cr3 if and only if the early hardware checks + * KVM modifies vcpu->arch.private->cr3 if and only if the early hardware= checks * pass, and early VM-Fails do not reset KVM's MMU, i.e. the VM-Fail * path would need to manually save/restore vmcs01.GUEST_CR3. */ if (!enable_ept && !nested_early_check) - vmcs_writel(GUEST_CR3, vcpu->arch.cr3); + vmcs_writel(GUEST_CR3, vcpu->arch.private->cr3); =20 vmx_switch_vmcs(vcpu, &vmx->nested.vmcs02); =20 @@ -3655,20 +3655,20 @@ static inline unsigned long vmcs12_guest_cr0(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) { return - /*1*/ (vmcs_readl(GUEST_CR0) & vcpu->arch.cr0_guest_owned_bits) | + /*1*/ (vmcs_readl(GUEST_CR0) & vcpu->arch.private->cr0_guest_owned_bits) | /*2*/ (vmcs12->guest_cr0 & vmcs12->cr0_guest_host_mask) | /*3*/ (vmcs_readl(CR0_READ_SHADOW) & ~(vmcs12->cr0_guest_host_mask | - vcpu->arch.cr0_guest_owned_bits)); + vcpu->arch.private->cr0_guest_owned_bits)); } =20 static inline unsigned long vmcs12_guest_cr4(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) { return - /*1*/ (vmcs_readl(GUEST_CR4) & vcpu->arch.cr4_guest_owned_bits) | + /*1*/ (vmcs_readl(GUEST_CR4) & vcpu->arch.private->cr4_guest_owned_bits) | /*2*/ (vmcs12->guest_cr4 & vmcs12->cr4_guest_host_mask) | /*3*/ (vmcs_readl(CR4_READ_SHADOW) & ~(vmcs12->cr4_guest_host_mask | - vcpu->arch.cr4_guest_owned_bits)); + vcpu->arch.private->cr4_guest_owned_bits)); } =20 static void vmcs12_save_pending_event(struct kvm_vcpu *vcpu, @@ -4255,11 +4255,11 @@ static void load_vmcs12_host_state(struct kvm_vcpu = *vcpu, * CR0_GUEST_HOST_MASK is already set in the original vmcs01 * (KVM doesn't change it); */ - vcpu->arch.cr0_guest_owned_bits =3D KVM_POSSIBLE_CR0_GUEST_BITS; + vcpu->arch.private->cr0_guest_owned_bits =3D KVM_POSSIBLE_CR0_GUEST_BITS; vmx_set_cr0(vcpu, vmcs12->host_cr0); =20 /* Same as above - no reason to call set_cr4_guest_host_mask(). */ - vcpu->arch.cr4_guest_owned_bits =3D ~vmcs_readl(CR4_GUEST_HOST_MASK); + vcpu->arch.private->cr4_guest_owned_bits =3D ~vmcs_readl(CR4_GUEST_HOST_M= ASK); vmx_set_cr4(vcpu, vmcs12->host_cr4); =20 nested_ept_uninit_mmu_context(vcpu); @@ -4405,14 +4405,14 @@ static void nested_vmx_restore_host_state(struct kv= m_vcpu *vcpu) */ vmx_set_efer(vcpu, nested_vmx_get_vmcs01_guest_efer(vmx)); =20 - vcpu->arch.cr0_guest_owned_bits =3D KVM_POSSIBLE_CR0_GUEST_BITS; + vcpu->arch.private->cr0_guest_owned_bits =3D KVM_POSSIBLE_CR0_GUEST_BITS; vmx_set_cr0(vcpu, vmcs_readl(CR0_READ_SHADOW)); =20 - vcpu->arch.cr4_guest_owned_bits =3D ~vmcs_readl(CR4_GUEST_HOST_MASK); + vcpu->arch.private->cr4_guest_owned_bits =3D ~vmcs_readl(CR4_GUEST_HOST_M= ASK); vmx_set_cr4(vcpu, vmcs_readl(CR4_READ_SHADOW)); =20 nested_ept_uninit_mmu_context(vcpu); - vcpu->arch.cr3 =3D vmcs_readl(GUEST_CR3); + vcpu->arch.private->cr3 =3D vmcs_readl(GUEST_CR3); kvm_register_mark_available(vcpu, VCPU_EXREG_CR3); =20 /* @@ -5000,7 +5000,7 @@ static inline void nested_release_vmcs12(struct kvm_v= cpu *vcpu) vmx->nested.current_vmptr >> PAGE_SHIFT, vmx->nested.cached_vmcs12, 0, VMCS12_SIZE); =20 - kvm_mmu_free_roots(vcpu, &vcpu->arch.guest_mmu, KVM_MMU_ROOTS_ALL); + kvm_mmu_free_roots(vcpu, &vcpu->arch.private->guest_mmu, KVM_MMU_ROOTS_AL= L); =20 vmx->nested.current_vmptr =3D INVALID_GPA; } @@ -5427,7 +5427,7 @@ static int handle_invept(struct kvm_vcpu *vcpu) * Nested EPT roots are always held through guest_mmu, * not root_mmu. */ - mmu =3D &vcpu->arch.guest_mmu; + mmu =3D &vcpu->arch.private->guest_mmu; =20 switch (type) { case VMX_EPT_EXTENT_CONTEXT: @@ -5545,7 +5545,7 @@ static int handle_invvpid(struct kvm_vcpu *vcpu) * TODO: sync only the affected SPTEs for INVDIVIDUAL_ADDR. */ if (!enable_ept) - kvm_mmu_free_guest_mode_roots(vcpu, &vcpu->arch.root_mmu); + kvm_mmu_free_guest_mode_roots(vcpu, &vcpu->arch.private->root_mmu); =20 return nested_vmx_succeed(vcpu); } diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 6e1bb017b696..beba656116d7 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -2242,20 +2242,20 @@ static void vmx_cache_reg(struct kvm_vcpu *vcpu, en= um kvm_reg reg) =20 switch (reg) { case VCPU_REGS_RSP: - vcpu->arch.regs[VCPU_REGS_RSP] =3D vmcs_readl(GUEST_RSP); + vcpu->arch.private->regs[VCPU_REGS_RSP] =3D vmcs_readl(GUEST_RSP); break; case VCPU_REGS_RIP: - vcpu->arch.regs[VCPU_REGS_RIP] =3D vmcs_readl(GUEST_RIP); + vcpu->arch.private->regs[VCPU_REGS_RIP] =3D vmcs_readl(GUEST_RIP); break; case VCPU_EXREG_PDPTR: if (enable_ept) ept_save_pdptrs(vcpu); break; case VCPU_EXREG_CR0: - guest_owned_bits =3D vcpu->arch.cr0_guest_owned_bits; + guest_owned_bits =3D vcpu->arch.private->cr0_guest_owned_bits; =20 - vcpu->arch.cr0 &=3D ~guest_owned_bits; - vcpu->arch.cr0 |=3D vmcs_readl(GUEST_CR0) & guest_owned_bits; + vcpu->arch.private->cr0 &=3D ~guest_owned_bits; + vcpu->arch.private->cr0 |=3D vmcs_readl(GUEST_CR0) & guest_owned_bits; break; case VCPU_EXREG_CR3: /* @@ -2263,13 +2263,13 @@ static void vmx_cache_reg(struct kvm_vcpu *vcpu, en= um kvm_reg reg) * CR3 is loaded into hardware, not the guest's CR3. */ if (!(exec_controls_get(to_vmx(vcpu)) & CPU_BASED_CR3_LOAD_EXITING)) - vcpu->arch.cr3 =3D vmcs_readl(GUEST_CR3); + vcpu->arch.private->cr3 =3D vmcs_readl(GUEST_CR3); break; case VCPU_EXREG_CR4: - guest_owned_bits =3D vcpu->arch.cr4_guest_owned_bits; + guest_owned_bits =3D vcpu->arch.private->cr4_guest_owned_bits; =20 - vcpu->arch.cr4 &=3D ~guest_owned_bits; - vcpu->arch.cr4 |=3D vmcs_readl(GUEST_CR4) & guest_owned_bits; + vcpu->arch.private->cr4 &=3D ~guest_owned_bits; + vcpu->arch.private->cr4 |=3D vmcs_readl(GUEST_CR4) & guest_owned_bits; break; default: KVM_BUG_ON(1, vcpu->kvm); @@ -2926,7 +2926,7 @@ static inline int vmx_get_current_vpid(struct kvm_vcp= u *vcpu) =20 static void vmx_flush_tlb_current(struct kvm_vcpu *vcpu) { - struct kvm_mmu *mmu =3D vcpu->arch.mmu; + struct kvm_mmu *mmu =3D vcpu->arch.private->mmu; u64 root_hpa =3D mmu->root_hpa; =20 /* No flush required if the current context is invalid. */ @@ -2963,7 +2963,7 @@ static void vmx_flush_tlb_guest(struct kvm_vcpu *vcpu) =20 void vmx_ept_load_pdptrs(struct kvm_vcpu *vcpu) { - struct kvm_mmu *mmu =3D vcpu->arch.walk_mmu; + struct kvm_mmu *mmu =3D vcpu->arch.private->walk_mmu; =20 if (!kvm_register_is_dirty(vcpu, VCPU_EXREG_PDPTR)) return; @@ -2978,7 +2978,7 @@ void vmx_ept_load_pdptrs(struct kvm_vcpu *vcpu) =20 void ept_save_pdptrs(struct kvm_vcpu *vcpu) { - struct kvm_mmu *mmu =3D vcpu->arch.walk_mmu; + struct kvm_mmu *mmu =3D vcpu->arch.private->walk_mmu; =20 if (WARN_ON_ONCE(!is_pae_paging(vcpu))) return; @@ -3019,7 +3019,7 @@ void vmx_set_cr0(struct kvm_vcpu *vcpu, unsigned long= cr0) =20 vmcs_writel(CR0_READ_SHADOW, cr0); vmcs_writel(GUEST_CR0, hw_cr0); - vcpu->arch.cr0 =3D cr0; + vcpu->arch.private->cr0 =3D cr0; kvm_register_mark_available(vcpu, VCPU_EXREG_CR0); =20 #ifdef CONFIG_X86_64 @@ -3067,12 +3067,12 @@ void vmx_set_cr0(struct kvm_vcpu *vcpu, unsigned lo= ng cr0) exec_controls_set(vmx, tmp); } =20 - /* Note, vmx_set_cr4() consumes the new vcpu->arch.cr0. */ + /* Note, vmx_set_cr4() consumes the new vcpu->arch.private->cr0. */ if ((old_cr0_pg ^ cr0) & X86_CR0_PG) vmx_set_cr4(vcpu, kvm_read_cr4(vcpu)); } =20 - /* depends on vcpu->arch.cr0 to be set to a new value */ + /* depends on vcpu->arch.private->cr0 to be set to a new value */ vmx->emulation_required =3D vmx_emulation_required(vcpu); } =20 @@ -3114,7 +3114,7 @@ static void vmx_load_mmu_pgd(struct kvm_vcpu *vcpu, h= pa_t root_hpa, if (!enable_unrestricted_guest && !is_paging(vcpu)) guest_cr3 =3D to_kvm_vmx(kvm)->ept_identity_map_addr; else if (test_bit(VCPU_EXREG_CR3, (ulong *)&vcpu->arch.regs_avail)) - guest_cr3 =3D vcpu->arch.cr3; + guest_cr3 =3D vcpu->arch.private->cr3; else /* vmcs01.GUEST_CR3 is already up-to-date. */ update_guest_cr3 =3D false; vmx_ept_load_pdptrs(vcpu); @@ -3144,7 +3144,7 @@ static bool vmx_is_valid_cr4(struct kvm_vcpu *vcpu, u= nsigned long cr4) =20 void vmx_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4) { - unsigned long old_cr4 =3D vcpu->arch.cr4; + unsigned long old_cr4 =3D vcpu->arch.private->cr4; struct vcpu_vmx *vmx =3D to_vmx(vcpu); /* * Pass through host's Machine Check Enable value to hw_cr4, which @@ -3171,7 +3171,7 @@ void vmx_set_cr4(struct kvm_vcpu *vcpu, unsigned long= cr4) } } =20 - vcpu->arch.cr4 =3D cr4; + vcpu->arch.private->cr4 =3D cr4; kvm_register_mark_available(vcpu, VCPU_EXREG_CR4); =20 if (!is_unrestricted_guest(vcpu)) { @@ -4040,14 +4040,14 @@ void set_cr4_guest_host_mask(struct vcpu_vmx *vmx) { struct kvm_vcpu *vcpu =3D &vmx->vcpu; =20 - vcpu->arch.cr4_guest_owned_bits =3D KVM_POSSIBLE_CR4_GUEST_BITS & - ~vcpu->arch.cr4_guest_rsvd_bits; + vcpu->arch.private->cr4_guest_owned_bits =3D KVM_POSSIBLE_CR4_GUEST_BITS & + ~vcpu->arch.private->cr4_guest_rsvd_bits; if (!enable_ept) - vcpu->arch.cr4_guest_owned_bits &=3D ~X86_CR4_PGE; + vcpu->arch.private->cr4_guest_owned_bits &=3D ~X86_CR4_PGE; if (is_guest_mode(&vmx->vcpu)) - vcpu->arch.cr4_guest_owned_bits &=3D + vcpu->arch.private->cr4_guest_owned_bits &=3D ~get_vmcs12(vcpu)->cr4_guest_host_mask; - vmcs_writel(CR4_GUEST_HOST_MASK, ~vcpu->arch.cr4_guest_owned_bits); + vmcs_writel(CR4_GUEST_HOST_MASK, ~vcpu->arch.private->cr4_guest_owned_bit= s); } =20 static u32 vmx_pin_based_exec_ctrl(struct vcpu_vmx *vmx) @@ -4345,8 +4345,8 @@ static void init_vmcs(struct vcpu_vmx *vmx) /* 22.2.1, 20.8.1 */ vm_entry_controls_set(vmx, vmx_vmentry_ctrl()); =20 - vmx->vcpu.arch.cr0_guest_owned_bits =3D KVM_POSSIBLE_CR0_GUEST_BITS; - vmcs_writel(CR0_GUEST_HOST_MASK, ~vmx->vcpu.arch.cr0_guest_owned_bits); + vmx->vcpu.arch.private->cr0_guest_owned_bits =3D KVM_POSSIBLE_CR0_GUEST_B= ITS; + vmcs_writel(CR0_GUEST_HOST_MASK, ~vmx->vcpu.arch.private->cr0_guest_owned= _bits); =20 set_cr4_guest_host_mask(vmx); =20 @@ -4956,7 +4956,7 @@ static int handle_set_cr4(struct kvm_vcpu *vcpu, unsi= gned long val) =20 static int handle_desc(struct kvm_vcpu *vcpu) { - WARN_ON(!(vcpu->arch.cr4 & X86_CR4_UMIP)); + WARN_ON(!(vcpu->arch.private->cr4 & X86_CR4_UMIP)); return kvm_emulate_instruction(vcpu, 0); } =20 @@ -6626,13 +6626,13 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_= vcpu *vcpu, vmx->loaded_vmcs->host_state.cr3 =3D cr3; } =20 - if (vcpu->arch.cr2 !=3D native_read_cr2()) - native_write_cr2(vcpu->arch.cr2); + if (vcpu->arch.private->cr2 !=3D native_read_cr2()) + native_write_cr2(vcpu->arch.private->cr2); =20 - vmx->fail =3D __vmx_vcpu_run(vmx, (unsigned long *)&vcpu->arch.regs, + vmx->fail =3D __vmx_vcpu_run(vmx, (unsigned long *)&vcpu->arch.private->r= egs, vmx->loaded_vmcs->launched); =20 - vcpu->arch.cr2 =3D native_read_cr2(); + vcpu->arch.private->cr2 =3D native_read_cr2(); =20 VM_WARN_ON_ONCE(vcpu->kvm->asi && !is_asi_active()); asi_set_target_unrestricted(); @@ -6681,9 +6681,9 @@ static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu) WARN_ON_ONCE(vmx->nested.need_vmcs12_to_shadow_sync); =20 if (kvm_register_is_dirty(vcpu, VCPU_REGS_RSP)) - vmcs_writel(GUEST_RSP, vcpu->arch.regs[VCPU_REGS_RSP]); + vmcs_writel(GUEST_RSP, vcpu->arch.private->regs[VCPU_REGS_RSP]); if (kvm_register_is_dirty(vcpu, VCPU_REGS_RIP)) - vmcs_writel(GUEST_RIP, vcpu->arch.regs[VCPU_REGS_RIP]); + vmcs_writel(GUEST_RIP, vcpu->arch.private->regs[VCPU_REGS_RIP]); =20 cr4 =3D cr4_read_shadow(); if (unlikely(cr4 !=3D vmx->loaded_vmcs->host_state.cr4)) { diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index dd862edc1b5a..680725089a18 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -595,7 +595,7 @@ void kvm_deliver_exception_payload(struct kvm_vcpu *vcp= u) vcpu->arch.dr6 &=3D ~BIT(12); break; case PF_VECTOR: - vcpu->arch.cr2 =3D payload; + vcpu->arch.private->cr2 =3D payload; break; } =20 @@ -736,8 +736,8 @@ bool kvm_inject_emulated_page_fault(struct kvm_vcpu *vc= pu, struct kvm_mmu *fault_mmu; WARN_ON_ONCE(fault->vector !=3D PF_VECTOR); =20 - fault_mmu =3D fault->nested_page_fault ? vcpu->arch.mmu : - vcpu->arch.walk_mmu; + fault_mmu =3D fault->nested_page_fault ? vcpu->arch.private->mmu : + vcpu->arch.private->walk_mmu; =20 /* * Invalidate the TLB entry for the faulting address, if it exists, @@ -892,7 +892,7 @@ int kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr= 0) #endif if (!(vcpu->arch.efer & EFER_LME) && (cr0 & X86_CR0_PG) && is_pae(vcpu) && ((cr0 ^ old_cr0) & pdptr_bits) && - !load_pdptrs(vcpu, vcpu->arch.walk_mmu, kvm_read_cr3(vcpu))) + !load_pdptrs(vcpu, vcpu->arch.private->walk_mmu, kvm_read_cr3(vcpu))) return 1; =20 if (!(cr0 & X86_CR0_PG) && @@ -920,8 +920,8 @@ void kvm_load_guest_xsave_state(struct kvm_vcpu *vcpu) =20 if (kvm_read_cr4_bits(vcpu, X86_CR4_OSXSAVE)) { =20 - if (vcpu->arch.xcr0 !=3D host_xcr0) - xsetbv(XCR_XFEATURE_ENABLED_MASK, vcpu->arch.xcr0); + if (vcpu->arch.private->xcr0 !=3D host_xcr0) + xsetbv(XCR_XFEATURE_ENABLED_MASK, vcpu->arch.private->xcr0); =20 if (vcpu->arch.xsaves_enabled && vcpu->arch.ia32_xss !=3D host_xss) @@ -930,7 +930,7 @@ void kvm_load_guest_xsave_state(struct kvm_vcpu *vcpu) =20 if (static_cpu_has(X86_FEATURE_PKU) && (kvm_read_cr4_bits(vcpu, X86_CR4_PKE) || - (vcpu->arch.xcr0 & XFEATURE_MASK_PKRU)) && + (vcpu->arch.private->xcr0 & XFEATURE_MASK_PKRU)) && vcpu->arch.pkru !=3D vcpu->arch.host_pkru) write_pkru(vcpu->arch.pkru); } @@ -943,7 +943,7 @@ void kvm_load_host_xsave_state(struct kvm_vcpu *vcpu) =20 if (static_cpu_has(X86_FEATURE_PKU) && (kvm_read_cr4_bits(vcpu, X86_CR4_PKE) || - (vcpu->arch.xcr0 & XFEATURE_MASK_PKRU))) { + (vcpu->arch.private->xcr0 & XFEATURE_MASK_PKRU))) { vcpu->arch.pkru =3D rdpkru(); if (vcpu->arch.pkru !=3D vcpu->arch.host_pkru) write_pkru(vcpu->arch.host_pkru); @@ -951,7 +951,7 @@ void kvm_load_host_xsave_state(struct kvm_vcpu *vcpu) =20 if (kvm_read_cr4_bits(vcpu, X86_CR4_OSXSAVE)) { =20 - if (vcpu->arch.xcr0 !=3D host_xcr0) + if (vcpu->arch.private->xcr0 !=3D host_xcr0) xsetbv(XCR_XFEATURE_ENABLED_MASK, host_xcr0); =20 if (vcpu->arch.xsaves_enabled && @@ -965,7 +965,7 @@ EXPORT_SYMBOL_GPL(kvm_load_host_xsave_state); static int __kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 xcr) { u64 xcr0 =3D xcr; - u64 old_xcr0 =3D vcpu->arch.xcr0; + u64 old_xcr0 =3D vcpu->arch.private->xcr0; u64 valid_bits; =20 /* Only support XCR_XFEATURE_ENABLED_MASK(xcr0) now */ @@ -981,7 +981,7 @@ static int __kvm_set_xcr(struct kvm_vcpu *vcpu, u32 ind= ex, u64 xcr) * saving. However, xcr0 bit 0 is always set, even if the * emulated CPU does not support XSAVE (see kvm_vcpu_reset()). */ - valid_bits =3D vcpu->arch.guest_supported_xcr0 | XFEATURE_MASK_FP; + valid_bits =3D vcpu->arch.private->guest_supported_xcr0 | XFEATURE_MASK_F= P; if (xcr0 & ~valid_bits) return 1; =20 @@ -995,7 +995,7 @@ static int __kvm_set_xcr(struct kvm_vcpu *vcpu, u32 ind= ex, u64 xcr) if ((xcr0 & XFEATURE_MASK_AVX512) !=3D XFEATURE_MASK_AVX512) return 1; } - vcpu->arch.xcr0 =3D xcr0; + vcpu->arch.private->xcr0 =3D xcr0; =20 if ((xcr0 ^ old_xcr0) & XFEATURE_MASK_EXTEND) kvm_update_cpuid_runtime(vcpu); @@ -1019,7 +1019,7 @@ bool kvm_is_valid_cr4(struct kvm_vcpu *vcpu, unsigned= long cr4) if (cr4 & cr4_reserved_bits) return false; =20 - if (cr4 & vcpu->arch.cr4_guest_rsvd_bits) + if (cr4 & vcpu->arch.private->cr4_guest_rsvd_bits) return false; =20 return static_call(kvm_x86_is_valid_cr4)(vcpu, cr4); @@ -1069,7 +1069,7 @@ int kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long = cr4) return 1; } else if (is_paging(vcpu) && (cr4 & X86_CR4_PAE) && ((cr4 ^ old_cr4) & pdptr_bits) - && !load_pdptrs(vcpu, vcpu->arch.walk_mmu, + && !load_pdptrs(vcpu, vcpu->arch.private->walk_mmu, kvm_read_cr3(vcpu))) return 1; =20 @@ -1092,7 +1092,7 @@ EXPORT_SYMBOL_GPL(kvm_set_cr4); =20 static void kvm_invalidate_pcid(struct kvm_vcpu *vcpu, unsigned long pcid) { - struct kvm_mmu *mmu =3D vcpu->arch.mmu; + struct kvm_mmu *mmu =3D vcpu->arch.private->mmu; unsigned long roots_to_free =3D 0; int i; =20 @@ -1159,13 +1159,13 @@ int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned lon= g cr3) if (kvm_vcpu_is_illegal_gpa(vcpu, cr3)) return 1; =20 - if (is_pae_paging(vcpu) && !load_pdptrs(vcpu, vcpu->arch.walk_mmu, cr3)) + if (is_pae_paging(vcpu) && !load_pdptrs(vcpu, vcpu->arch.private->walk_mm= u, cr3)) return 1; =20 if (cr3 !=3D kvm_read_cr3(vcpu)) kvm_mmu_new_pgd(vcpu, cr3); =20 - vcpu->arch.cr3 =3D cr3; + vcpu->arch.private->cr3 =3D cr3; kvm_register_mark_available(vcpu, VCPU_EXREG_CR3); =20 handle_tlb_flush: @@ -1190,7 +1190,7 @@ int kvm_set_cr8(struct kvm_vcpu *vcpu, unsigned long = cr8) if (lapic_in_kernel(vcpu)) kvm_lapic_set_tpr(vcpu, cr8); else - vcpu->arch.cr8 =3D cr8; + vcpu->arch.private->cr8 =3D cr8; return 0; } EXPORT_SYMBOL_GPL(kvm_set_cr8); @@ -1200,7 +1200,7 @@ unsigned long kvm_get_cr8(struct kvm_vcpu *vcpu) if (lapic_in_kernel(vcpu)) return kvm_lapic_get_cr8(vcpu); else - return vcpu->arch.cr8; + return vcpu->arch.private->cr8; } EXPORT_SYMBOL_GPL(kvm_get_cr8); =20 @@ -4849,10 +4849,10 @@ static int kvm_vcpu_ioctl_x86_set_debugregs(struct = kvm_vcpu *vcpu, static void kvm_vcpu_ioctl_x86_get_xsave(struct kvm_vcpu *vcpu, struct kvm_xsave *guest_xsave) { - if (fpstate_is_confidential(&vcpu->arch.guest_fpu)) + if (fpstate_is_confidential(&vcpu->arch.private->guest_fpu)) return; =20 - fpu_copy_guest_fpstate_to_uabi(&vcpu->arch.guest_fpu, + fpu_copy_guest_fpstate_to_uabi(&vcpu->arch.private->guest_fpu, guest_xsave->region, sizeof(guest_xsave->region), vcpu->arch.pkru); @@ -4861,10 +4861,10 @@ static void kvm_vcpu_ioctl_x86_get_xsave(struct kvm= _vcpu *vcpu, static int kvm_vcpu_ioctl_x86_set_xsave(struct kvm_vcpu *vcpu, struct kvm_xsave *guest_xsave) { - if (fpstate_is_confidential(&vcpu->arch.guest_fpu)) + if (fpstate_is_confidential(&vcpu->arch.private->guest_fpu)) return 0; =20 - return fpu_copy_uabi_to_guest_fpstate(&vcpu->arch.guest_fpu, + return fpu_copy_uabi_to_guest_fpstate(&vcpu->arch.private->guest_fpu, guest_xsave->region, supported_xcr0, &vcpu->arch.pkru); } @@ -4880,7 +4880,7 @@ static void kvm_vcpu_ioctl_x86_get_xcrs(struct kvm_vc= pu *vcpu, guest_xcrs->nr_xcrs =3D 1; guest_xcrs->flags =3D 0; guest_xcrs->xcrs[0].xcr =3D XCR_XFEATURE_ENABLED_MASK; - guest_xcrs->xcrs[0].value =3D vcpu->arch.xcr0; + guest_xcrs->xcrs[0].value =3D vcpu->arch.private->xcr0; } =20 static int kvm_vcpu_ioctl_x86_set_xcrs(struct kvm_vcpu *vcpu, @@ -6516,7 +6516,7 @@ gpa_t translate_nested_gpa(struct kvm_vcpu *vcpu, gpa= _t gpa, u32 access, =20 /* NPT walks are always user-walks */ access |=3D PFERR_USER_MASK; - t_gpa =3D vcpu->arch.mmu->gva_to_gpa(vcpu, gpa, access, exception); + t_gpa =3D vcpu->arch.private->mmu->gva_to_gpa(vcpu, gpa, access, excepti= on); =20 return t_gpa; } @@ -6525,7 +6525,7 @@ gpa_t kvm_mmu_gva_to_gpa_read(struct kvm_vcpu *vcpu, = gva_t gva, struct x86_exception *exception) { u32 access =3D (static_call(kvm_x86_get_cpl)(vcpu) =3D=3D 3) ? PFERR_USER= _MASK : 0; - return vcpu->arch.walk_mmu->gva_to_gpa(vcpu, gva, access, exception); + return vcpu->arch.private->walk_mmu->gva_to_gpa(vcpu, gva, access, except= ion); } EXPORT_SYMBOL_GPL(kvm_mmu_gva_to_gpa_read); =20 @@ -6534,7 +6534,7 @@ EXPORT_SYMBOL_GPL(kvm_mmu_gva_to_gpa_read); { u32 access =3D (static_call(kvm_x86_get_cpl)(vcpu) =3D=3D 3) ? PFERR_USER= _MASK : 0; access |=3D PFERR_FETCH_MASK; - return vcpu->arch.walk_mmu->gva_to_gpa(vcpu, gva, access, exception); + return vcpu->arch.private->walk_mmu->gva_to_gpa(vcpu, gva, access, except= ion); } =20 gpa_t kvm_mmu_gva_to_gpa_write(struct kvm_vcpu *vcpu, gva_t gva, @@ -6542,7 +6542,7 @@ gpa_t kvm_mmu_gva_to_gpa_write(struct kvm_vcpu *vcpu,= gva_t gva, { u32 access =3D (static_call(kvm_x86_get_cpl)(vcpu) =3D=3D 3) ? PFERR_USER= _MASK : 0; access |=3D PFERR_WRITE_MASK; - return vcpu->arch.walk_mmu->gva_to_gpa(vcpu, gva, access, exception); + return vcpu->arch.private->walk_mmu->gva_to_gpa(vcpu, gva, access, except= ion); } EXPORT_SYMBOL_GPL(kvm_mmu_gva_to_gpa_write); =20 @@ -6550,7 +6550,7 @@ EXPORT_SYMBOL_GPL(kvm_mmu_gva_to_gpa_write); gpa_t kvm_mmu_gva_to_gpa_system(struct kvm_vcpu *vcpu, gva_t gva, struct x86_exception *exception) { - return vcpu->arch.walk_mmu->gva_to_gpa(vcpu, gva, 0, exception); + return vcpu->arch.private->walk_mmu->gva_to_gpa(vcpu, gva, 0, exception); } =20 static int kvm_read_guest_virt_helper(gva_t addr, void *val, unsigned int = bytes, @@ -6561,7 +6561,7 @@ static int kvm_read_guest_virt_helper(gva_t addr, voi= d *val, unsigned int bytes, int r =3D X86EMUL_CONTINUE; =20 while (bytes) { - gpa_t gpa =3D vcpu->arch.walk_mmu->gva_to_gpa(vcpu, addr, access, + gpa_t gpa =3D vcpu->arch.private->walk_mmu->gva_to_gpa(vcpu, addr, acces= s, exception); unsigned offset =3D addr & (PAGE_SIZE-1); unsigned toread =3D min(bytes, (unsigned)PAGE_SIZE - offset); @@ -6595,7 +6595,7 @@ static int kvm_fetch_guest_virt(struct x86_emulate_ct= xt *ctxt, int ret; =20 /* Inline kvm_read_guest_virt_helper for speed. */ - gpa_t gpa =3D vcpu->arch.walk_mmu->gva_to_gpa(vcpu, addr, access|PFERR_FE= TCH_MASK, + gpa_t gpa =3D vcpu->arch.private->walk_mmu->gva_to_gpa(vcpu, addr, access= |PFERR_FETCH_MASK, exception); if (unlikely(gpa =3D=3D UNMAPPED_GVA)) return X86EMUL_PROPAGATE_FAULT; @@ -6659,7 +6659,7 @@ static int kvm_write_guest_virt_helper(gva_t addr, vo= id *val, unsigned int bytes int r =3D X86EMUL_CONTINUE; =20 while (bytes) { - gpa_t gpa =3D vcpu->arch.walk_mmu->gva_to_gpa(vcpu, addr, + gpa_t gpa =3D vcpu->arch.private->walk_mmu->gva_to_gpa(vcpu, addr, access, exception); unsigned offset =3D addr & (PAGE_SIZE-1); @@ -6757,7 +6757,7 @@ static int vcpu_mmio_gva_to_gpa(struct kvm_vcpu *vcpu= , unsigned long gva, * shadow page table for L2 guest. */ if (vcpu_match_mmio_gva(vcpu, gva) && (!is_paging(vcpu) || - !permission_fault(vcpu, vcpu->arch.walk_mmu, + !permission_fault(vcpu, vcpu->arch.private->walk_mmu, vcpu->arch.mmio_access, 0, access))) { *gpa =3D vcpu->arch.mmio_gfn << PAGE_SHIFT | (gva & (PAGE_SIZE - 1)); @@ -6765,7 +6765,7 @@ static int vcpu_mmio_gva_to_gpa(struct kvm_vcpu *vcpu= , unsigned long gva, return 1; } =20 - *gpa =3D vcpu->arch.walk_mmu->gva_to_gpa(vcpu, gva, access, exception); + *gpa =3D vcpu->arch.private->walk_mmu->gva_to_gpa(vcpu, gva, access, exce= ption); =20 if (*gpa =3D=3D UNMAPPED_GVA) return -1; @@ -6867,7 +6867,7 @@ static int emulator_read_write_onepage(unsigned long = addr, void *val, int handled, ret; bool write =3D ops->write; struct kvm_mmio_fragment *frag; - struct x86_emulate_ctxt *ctxt =3D vcpu->arch.emulate_ctxt; + struct x86_emulate_ctxt *ctxt =3D vcpu->arch.private->emulate_ctxt; =20 /* * If the exit was due to a NPF we may already have a GPA. @@ -7246,7 +7246,7 @@ static unsigned long emulator_get_cr(struct x86_emula= te_ctxt *ctxt, int cr) value =3D kvm_read_cr0(vcpu); break; case 2: - value =3D vcpu->arch.cr2; + value =3D vcpu->arch.private->cr2; break; case 3: value =3D kvm_read_cr3(vcpu); @@ -7275,7 +7275,7 @@ static int emulator_set_cr(struct x86_emulate_ctxt *c= txt, int cr, ulong val) res =3D kvm_set_cr0(vcpu, mk_cr_64(kvm_read_cr0(vcpu), val)); break; case 2: - vcpu->arch.cr2 =3D val; + vcpu->arch.private->cr2 =3D val; break; case 3: res =3D kvm_set_cr3(vcpu, val); @@ -7597,7 +7597,7 @@ static void toggle_interruptibility(struct kvm_vcpu *= vcpu, u32 mask) =20 static bool inject_emulated_exception(struct kvm_vcpu *vcpu) { - struct x86_emulate_ctxt *ctxt =3D vcpu->arch.emulate_ctxt; + struct x86_emulate_ctxt *ctxt =3D vcpu->arch.private->emulate_ctxt; if (ctxt->exception.vector =3D=3D PF_VECTOR) return kvm_inject_emulated_page_fault(vcpu, &ctxt->exception); =20 @@ -7621,14 +7621,14 @@ static struct x86_emulate_ctxt *alloc_emulate_ctxt(= struct kvm_vcpu *vcpu) =20 ctxt->vcpu =3D vcpu; ctxt->ops =3D &emulate_ops; - vcpu->arch.emulate_ctxt =3D ctxt; + vcpu->arch.private->emulate_ctxt =3D ctxt; =20 return ctxt; } =20 static void init_emulate_ctxt(struct kvm_vcpu *vcpu) { - struct x86_emulate_ctxt *ctxt =3D vcpu->arch.emulate_ctxt; + struct x86_emulate_ctxt *ctxt =3D vcpu->arch.private->emulate_ctxt; int cs_db, cs_l; =20 static_call(kvm_x86_get_cs_db_l_bits)(vcpu, &cs_db, &cs_l); @@ -7658,7 +7658,7 @@ static void init_emulate_ctxt(struct kvm_vcpu *vcpu) =20 void kvm_inject_realmode_interrupt(struct kvm_vcpu *vcpu, int irq, int inc= _eip) { - struct x86_emulate_ctxt *ctxt =3D vcpu->arch.emulate_ctxt; + struct x86_emulate_ctxt *ctxt =3D vcpu->arch.private->emulate_ctxt; int ret; =20 init_emulate_ctxt(vcpu); @@ -7731,7 +7731,7 @@ static void prepare_emulation_failure_exit(struct kvm= _vcpu *vcpu, u64 *data, =20 static void prepare_emulation_ctxt_failure_exit(struct kvm_vcpu *vcpu) { - struct x86_emulate_ctxt *ctxt =3D vcpu->arch.emulate_ctxt; + struct x86_emulate_ctxt *ctxt =3D vcpu->arch.private->emulate_ctxt; =20 prepare_emulation_failure_exit(vcpu, NULL, 0, ctxt->fetch.data, ctxt->fetch.end - ctxt->fetch.data); @@ -7792,7 +7792,7 @@ static bool reexecute_instruction(struct kvm_vcpu *vc= pu, gpa_t cr2_or_gpa, WARN_ON_ONCE(!(emulation_type & EMULTYPE_PF))) return false; =20 - if (!vcpu->arch.mmu->direct_map) { + if (!vcpu->arch.private->mmu->direct_map) { /* * Write permission should be allowed since only * write access need to be emulated. @@ -7825,7 +7825,7 @@ static bool reexecute_instruction(struct kvm_vcpu *vc= pu, gpa_t cr2_or_gpa, kvm_release_pfn_clean(pfn); =20 /* The instructions are well-emulated on direct mmu. */ - if (vcpu->arch.mmu->direct_map) { + if (vcpu->arch.private->mmu->direct_map) { unsigned int indirect_shadow_pages; =20 write_lock(&vcpu->kvm->mmu_lock); @@ -7893,7 +7893,7 @@ static bool retry_instruction(struct x86_emulate_ctxt= *ctxt, vcpu->arch.last_retry_eip =3D ctxt->eip; vcpu->arch.last_retry_addr =3D cr2_or_gpa; =20 - if (!vcpu->arch.mmu->direct_map) + if (!vcpu->arch.private->mmu->direct_map) gpa =3D kvm_mmu_gva_to_gpa_write(vcpu, cr2_or_gpa, NULL); =20 kvm_mmu_unprotect_page(vcpu->kvm, gpa_to_gfn(gpa)); @@ -8055,7 +8055,7 @@ int x86_decode_emulated_instruction(struct kvm_vcpu *= vcpu, int emulation_type, void *insn, int insn_len) { int r =3D EMULATION_OK; - struct x86_emulate_ctxt *ctxt =3D vcpu->arch.emulate_ctxt; + struct x86_emulate_ctxt *ctxt =3D vcpu->arch.private->emulate_ctxt; =20 init_emulate_ctxt(vcpu); =20 @@ -8081,7 +8081,7 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, gp= a_t cr2_or_gpa, int emulation_type, void *insn, int insn_len) { int r; - struct x86_emulate_ctxt *ctxt =3D vcpu->arch.emulate_ctxt; + struct x86_emulate_ctxt *ctxt =3D vcpu->arch.private->emulate_ctxt; bool writeback =3D true; bool write_fault_to_spt; =20 @@ -8160,7 +8160,7 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, gp= a_t cr2_or_gpa, ctxt->exception.address =3D cr2_or_gpa; =20 /* With shadow page tables, cr2 contains a GVA or nGPA. */ - if (vcpu->arch.mmu->direct_map) { + if (vcpu->arch.private->mmu->direct_map) { ctxt->gpa_available =3D true; ctxt->gpa_val =3D cr2_or_gpa; } @@ -9484,9 +9484,9 @@ static void enter_smm(struct kvm_vcpu *vcpu) kvm_set_rflags(vcpu, X86_EFLAGS_FIXED); kvm_rip_write(vcpu, 0x8000); =20 - cr0 =3D vcpu->arch.cr0 & ~(X86_CR0_PE | X86_CR0_EM | X86_CR0_TS | X86_CR0= _PG); + cr0 =3D vcpu->arch.private->cr0 & ~(X86_CR0_PE | X86_CR0_EM | X86_CR0_TS = | X86_CR0_PG); static_call(kvm_x86_set_cr0)(vcpu, cr0); - vcpu->arch.cr0 =3D cr0; + vcpu->arch.private->cr0 =3D cr0; =20 static_call(kvm_x86_set_cr4)(vcpu, 0); =20 @@ -10245,14 +10245,14 @@ static void kvm_load_guest_fpu(struct kvm_vcpu *v= cpu) * Exclude PKRU from restore as restored separately in * kvm_x86_ops.run(). */ - fpu_swap_kvm_fpstate(&vcpu->arch.guest_fpu, true); + fpu_swap_kvm_fpstate(&vcpu->arch.private->guest_fpu, true); trace_kvm_fpu(1); } =20 /* When vcpu_run ends, restore user space FPU context. */ static void kvm_put_guest_fpu(struct kvm_vcpu *vcpu) { - fpu_swap_kvm_fpstate(&vcpu->arch.guest_fpu, false); + fpu_swap_kvm_fpstate(&vcpu->arch.private->guest_fpu, false); ++vcpu->stat.fpu_reload; trace_kvm_fpu(0); } @@ -10342,7 +10342,7 @@ static void __get_regs(struct kvm_vcpu *vcpu, struc= t kvm_regs *regs) * that usually, but some bad designed PV devices (vmware * backdoor interface) need this to work */ - emulator_writeback_register_cache(vcpu->arch.emulate_ctxt); + emulator_writeback_register_cache(vcpu->arch.private->emulate_ctxt); vcpu->arch.emulate_regs_need_sync_to_vcpu =3D false; } regs->rax =3D kvm_rax_read(vcpu); @@ -10450,7 +10450,7 @@ static void __get_sregs_common(struct kvm_vcpu *vcp= u, struct kvm_sregs *sregs) sregs->gdt.limit =3D dt.size; sregs->gdt.base =3D dt.address; =20 - sregs->cr2 =3D vcpu->arch.cr2; + sregs->cr2 =3D vcpu->arch.private->cr2; sregs->cr3 =3D kvm_read_cr3(vcpu); =20 skip_protected_regs: @@ -10563,7 +10563,7 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu= *vcpu, int kvm_task_switch(struct kvm_vcpu *vcpu, u16 tss_selector, int idt_index, int reason, bool has_error_code, u32 error_code) { - struct x86_emulate_ctxt *ctxt =3D vcpu->arch.emulate_ctxt; + struct x86_emulate_ctxt *ctxt =3D vcpu->arch.private->emulate_ctxt; int ret; =20 init_emulate_ctxt(vcpu); @@ -10632,9 +10632,9 @@ static int __set_sregs_common(struct kvm_vcpu *vcpu= , struct kvm_sregs *sregs, dt.address =3D sregs->gdt.base; static_call(kvm_x86_set_gdt)(vcpu, &dt); =20 - vcpu->arch.cr2 =3D sregs->cr2; + vcpu->arch.private->cr2 =3D sregs->cr2; *mmu_reset_needed |=3D kvm_read_cr3(vcpu) !=3D sregs->cr3; - vcpu->arch.cr3 =3D sregs->cr3; + vcpu->arch.private->cr3 =3D sregs->cr3; kvm_register_mark_available(vcpu, VCPU_EXREG_CR3); =20 kvm_set_cr8(vcpu, sregs->cr8); @@ -10644,7 +10644,7 @@ static int __set_sregs_common(struct kvm_vcpu *vcpu= , struct kvm_sregs *sregs, =20 *mmu_reset_needed |=3D kvm_read_cr0(vcpu) !=3D sregs->cr0; static_call(kvm_x86_set_cr0)(vcpu, sregs->cr0); - vcpu->arch.cr0 =3D sregs->cr0; + vcpu->arch.private->cr0 =3D sregs->cr0; =20 *mmu_reset_needed |=3D kvm_read_cr4(vcpu) !=3D sregs->cr4; static_call(kvm_x86_set_cr4)(vcpu, sregs->cr4); @@ -10652,7 +10652,7 @@ static int __set_sregs_common(struct kvm_vcpu *vcpu= , struct kvm_sregs *sregs, if (update_pdptrs) { idx =3D srcu_read_lock(&vcpu->kvm->srcu); if (is_pae_paging(vcpu)) { - load_pdptrs(vcpu, vcpu->arch.walk_mmu, kvm_read_cr3(vcpu)); + load_pdptrs(vcpu, vcpu->arch.private->walk_mmu, kvm_read_cr3(vcpu)); *mmu_reset_needed =3D 1; } srcu_read_unlock(&vcpu->kvm->srcu, idx); @@ -10853,12 +10853,12 @@ int kvm_arch_vcpu_ioctl_get_fpu(struct kvm_vcpu *= vcpu, struct kvm_fpu *fpu) { struct fxregs_state *fxsave; =20 - if (fpstate_is_confidential(&vcpu->arch.guest_fpu)) + if (fpstate_is_confidential(&vcpu->arch.private->guest_fpu)) return 0; =20 vcpu_load(vcpu); =20 - fxsave =3D &vcpu->arch.guest_fpu.fpstate->regs.fxsave; + fxsave =3D &vcpu->arch.private->guest_fpu.fpstate->regs.fxsave; memcpy(fpu->fpr, fxsave->st_space, 128); fpu->fcw =3D fxsave->cwd; fpu->fsw =3D fxsave->swd; @@ -10876,12 +10876,12 @@ int kvm_arch_vcpu_ioctl_set_fpu(struct kvm_vcpu *= vcpu, struct kvm_fpu *fpu) { struct fxregs_state *fxsave; =20 - if (fpstate_is_confidential(&vcpu->arch.guest_fpu)) + if (fpstate_is_confidential(&vcpu->arch.private->guest_fpu)) return 0; =20 vcpu_load(vcpu); =20 - fxsave =3D &vcpu->arch.guest_fpu.fpstate->regs.fxsave; + fxsave =3D &vcpu->arch.private->guest_fpu.fpstate->regs.fxsave; =20 memcpy(fxsave->st_space, fpu->fpr, 128); fxsave->cwd =3D fpu->fcw; @@ -10988,7 +10988,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu) if (!alloc_emulate_ctxt(vcpu)) goto free_wbinvd_dirty_mask; =20 - if (!fpu_alloc_guest_fpstate(&vcpu->arch.guest_fpu)) { + if (!fpu_alloc_guest_fpstate(&vcpu->arch.private->guest_fpu)) { pr_err("kvm: failed to allocate vcpu's fpu\n"); goto free_emulate_ctxt; } @@ -11023,9 +11023,9 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu) return 0; =20 free_guest_fpu: - fpu_free_guest_fpstate(&vcpu->arch.guest_fpu); + fpu_free_guest_fpstate(&vcpu->arch.private->guest_fpu); free_emulate_ctxt: - kmem_cache_free(x86_emulator_cache, vcpu->arch.emulate_ctxt); + kmem_cache_free(x86_emulator_cache, vcpu->arch.private->emulate_ctxt); free_wbinvd_dirty_mask: free_cpumask_var(vcpu->arch.wbinvd_dirty_mask); fail_free_mce_banks: @@ -11067,9 +11067,9 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu) =20 static_call(kvm_x86_vcpu_free)(vcpu); =20 - kmem_cache_free(x86_emulator_cache, vcpu->arch.emulate_ctxt); + kmem_cache_free(x86_emulator_cache, vcpu->arch.private->emulate_ctxt); free_cpumask_var(vcpu->arch.wbinvd_dirty_mask); - fpu_free_guest_fpstate(&vcpu->arch.guest_fpu); + fpu_free_guest_fpstate(&vcpu->arch.private->guest_fpu); =20 kvm_hv_vcpu_uninit(vcpu); kvm_pmu_destroy(vcpu); @@ -11118,7 +11118,7 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool ini= t_event) vcpu->arch.dr7 =3D DR7_FIXED_1; kvm_update_dr7(vcpu); =20 - vcpu->arch.cr2 =3D 0; + vcpu->arch.private->cr2 =3D 0; =20 kvm_make_request(KVM_REQ_EVENT, vcpu); vcpu->arch.apf.msr_en_val =3D 0; @@ -11131,8 +11131,8 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool ini= t_event) kvm_async_pf_hash_reset(vcpu); vcpu->arch.apf.halted =3D false; =20 - if (vcpu->arch.guest_fpu.fpstate && kvm_mpx_supported()) { - struct fpstate *fpstate =3D vcpu->arch.guest_fpu.fpstate; + if (vcpu->arch.private->guest_fpu.fpstate && kvm_mpx_supported()) { + struct fpstate *fpstate =3D vcpu->arch.private->guest_fpu.fpstate; =20 /* * To avoid have the INIT path from kvm_apic_has_events() that be @@ -11154,11 +11154,11 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool i= nit_event) =20 vcpu->arch.msr_misc_features_enables =3D 0; =20 - vcpu->arch.xcr0 =3D XFEATURE_MASK_FP; + vcpu->arch.private->xcr0 =3D XFEATURE_MASK_FP; } =20 /* All GPRs except RDX (handled below) are zeroed on RESET/INIT. */ - memset(vcpu->arch.regs, 0, sizeof(vcpu->arch.regs)); + memset(vcpu->arch.private->regs, 0, sizeof(vcpu->arch.private->regs)); kvm_register_mark_dirty(vcpu, VCPU_REGS_RSP); =20 /* @@ -11178,7 +11178,7 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool ini= t_event) kvm_set_rflags(vcpu, X86_EFLAGS_FIXED); kvm_rip_write(vcpu, 0xfff0); =20 - vcpu->arch.cr3 =3D 0; + vcpu->arch.private->cr3 =3D 0; kvm_register_mark_dirty(vcpu, VCPU_EXREG_CR3); =20 /* @@ -12043,7 +12043,7 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcp= u, struct kvm_async_pf *work) { int r; =20 - if ((vcpu->arch.mmu->direct_map !=3D work->arch.direct_map) || + if ((vcpu->arch.private->mmu->direct_map !=3D work->arch.direct_map) || work->wakeup_all) return; =20 @@ -12051,8 +12051,8 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcp= u, struct kvm_async_pf *work) if (unlikely(r)) return; =20 - if (!vcpu->arch.mmu->direct_map && - work->arch.cr3 !=3D vcpu->arch.mmu->get_guest_pgd(vcpu)) + if (!vcpu->arch.private->mmu->direct_map && + work->arch.cr3 !=3D vcpu->arch.private->mmu->get_guest_pgd(vcpu)) return; =20 kvm_mmu_do_page_fault(vcpu, work->cr2_or_gpa, 0, true); @@ -12398,9 +12398,9 @@ void kvm_fixup_and_inject_pf_error(struct kvm_vcpu = *vcpu, gva_t gva, u16 error_c (PFERR_WRITE_MASK | PFERR_FETCH_MASK | PFERR_USER_MASK); =20 if (!(error_code & PFERR_PRESENT_MASK) || - vcpu->arch.walk_mmu->gva_to_gpa(vcpu, gva, access, &fault) !=3D UNMAP= PED_GVA) { + vcpu->arch.private->walk_mmu->gva_to_gpa(vcpu, gva, access, &fault) != =3D UNMAPPED_GVA) { /* - * If vcpu->arch.walk_mmu->gva_to_gpa succeeded, the page + * If vcpu->arch.private->walk_mmu->gva_to_gpa succeeded, the page * tables probably do not match the TLB. Just proceed * with the error code that the processor gave. */ @@ -12410,7 +12410,7 @@ void kvm_fixup_and_inject_pf_error(struct kvm_vcpu = *vcpu, gva_t gva, u16 error_c fault.nested_page_fault =3D false; fault.address =3D gva; } - vcpu->arch.walk_mmu->inject_page_fault(vcpu, &fault); + vcpu->arch.private->walk_mmu->inject_page_fault(vcpu, &fault); } EXPORT_SYMBOL_GPL(kvm_fixup_and_inject_pf_error); =20 diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h index 3d5da4daaf53..dbcb6551d111 100644 --- a/arch/x86/kvm/x86.h +++ b/arch/x86/kvm/x86.h @@ -183,7 +183,7 @@ static inline bool x86_exception_has_error_code(unsigne= d int vector) =20 static inline bool mmu_is_nested(struct kvm_vcpu *vcpu) { - return vcpu->arch.walk_mmu =3D=3D &vcpu->arch.nested_mmu; + return vcpu->arch.private->walk_mmu =3D=3D &vcpu->arch.private->nested_mm= u; } =20 static inline int is_pae(struct kvm_vcpu *vcpu) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 587a75428da8..3c4e27c5aea9 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -109,6 +109,8 @@ static atomic_t hardware_enable_failed; =20 static struct kmem_cache *kvm_vcpu_cache; =20 +static struct kmem_cache *kvm_vcpu_private_cache; + static __read_mostly struct preempt_ops kvm_preempt_ops; static DEFINE_PER_CPU_ASI_NOT_SENSITIVE(struct kvm_vcpu *, kvm_running_vcp= u); =20 @@ -457,6 +459,7 @@ void kvm_vcpu_destroy(struct kvm_vcpu *vcpu) put_pid(rcu_dereference_protected(vcpu->pid, 1)); =20 free_page((unsigned long)vcpu->run); + kmem_cache_free(kvm_vcpu_private_cache, vcpu->arch.private); kmem_cache_free(kvm_vcpu_cache, vcpu); } EXPORT_SYMBOL_GPL(kvm_vcpu_destroy); @@ -2392,7 +2395,7 @@ static int hva_to_pfn_remapped(struct vm_area_struct = *vma, * tail pages of non-compound higher order allocations, which * would then underflow the refcount when the caller does the * required put_page. Don't allow those pages here. - */=20 + */ if (!kvm_try_get_pfn(pfn)) r =3D -EFAULT; =20 @@ -3562,17 +3565,25 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm= , u32 id) if (r) goto vcpu_decrement; =20 - vcpu =3D kmem_cache_zalloc(kvm_vcpu_cache, GFP_KERNEL_ACCOUNT); + vcpu =3D kmem_cache_zalloc(kvm_vcpu_cache, + GFP_KERNEL_ACCOUNT | __GFP_GLOBAL_NONSENS= ITIVE); if (!vcpu) { r =3D -ENOMEM; goto vcpu_decrement; } =20 + vcpu->arch.private =3D kmem_cache_zalloc(kvm_vcpu_private_cache, + GFP_KERNEL | __GFP_LOCAL_NO= NSENSITIVE); + if (!vcpu->arch.private) { + r =3D -ENOMEM; + goto vcpu_free; + } + BUILD_BUG_ON(sizeof(struct kvm_run) > PAGE_SIZE); page =3D alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO | __GFP_LOCAL_NONSENS= ITIVE); if (!page) { r =3D -ENOMEM; - goto vcpu_free; + goto vcpu_private_free; } vcpu->run =3D page_address(page); =20 @@ -3631,6 +3642,8 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, = u32 id) kvm_arch_vcpu_destroy(vcpu); vcpu_free_run_page: free_page((unsigned long)vcpu->run); +vcpu_private_free: + kmem_cache_free(kvm_vcpu_private_cache, vcpu->arch.private); vcpu_free: kmem_cache_free(kvm_vcpu_cache, vcpu); vcpu_decrement: @@ -5492,7 +5505,7 @@ int kvm_init(void *opaque, unsigned vcpu_size, unsign= ed vcpu_align, vcpu_align =3D __alignof__(struct kvm_vcpu); kvm_vcpu_cache =3D kmem_cache_create_usercopy("kvm_vcpu", vcpu_size, vcpu_align, - SLAB_ACCOUNT, + SLAB_ACCOUNT|SLAB_GLOBAL_NONSENSITIVE, offsetof(struct kvm_vcpu, arch), offsetofend(struct kvm_vcpu, stats_id) - offsetof(struct kvm_vcpu, arch), @@ -5501,12 +5514,22 @@ int kvm_init(void *opaque, unsigned vcpu_size, unsi= gned vcpu_align, r =3D -ENOMEM; goto out_free_3; } - + kvm_vcpu_private_cache =3D kmem_cache_create_usercopy("kvm_vcpu_pr= ivate", + sizeof(struct kvm_vcpu_arch_private), + __alignof__(struct kvm_vcpu_arch_privat= e), + SLAB_ACCOUNT | SLAB_LOCAL_NONSENSITIVE, + 0, + sizeof(struct kvm_vcpu_arch_private), + NULL); + if (!kvm_vcpu_private_cache) { + r =3D -ENOMEM; + goto out_free_4; + } for_each_possible_cpu(cpu) { if (!alloc_cpumask_var_node(&per_cpu(cpu_kick_mask, cpu), GFP_KERNEL, cpu_to_node(cpu))) { r =3D -ENOMEM; - goto out_free_4; + goto out_free_vcpu_private_cache; } } =20 @@ -5541,6 +5564,8 @@ int kvm_init(void *opaque, unsigned vcpu_size, unsign= ed vcpu_align, out_free_5: for_each_possible_cpu(cpu) free_cpumask_var(per_cpu(cpu_kick_mask, cpu)); +out_free_vcpu_private_cache: + kmem_cache_destroy(kvm_vcpu_private_cache); out_free_4: kmem_cache_destroy(kvm_vcpu_cache); out_free_3: @@ -5567,6 +5592,7 @@ void kvm_exit(void) misc_deregister(&kvm_dev); for_each_possible_cpu(cpu) free_cpumask_var(per_cpu(cpu_kick_mask, cpu)); + kmem_cache_destroy(kvm_vcpu_private_cache); kmem_cache_destroy(kvm_vcpu_cache); kvm_async_pf_deinit(); unregister_syscore_ops(&kvm_syscore_ops); --=20 2.35.1.473.g83b2b277ed-goog From nobody Fri Jun 26 13:44:01 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 128F2C433EF for ; Wed, 23 Feb 2022 05:28:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238426AbiBWF3B (ORCPT ); Wed, 23 Feb 2022 00:29:01 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57960 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238465AbiBWF1U (ORCPT ); Wed, 23 Feb 2022 00:27:20 -0500 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8AB006EB2E for ; Tue, 22 Feb 2022 21:25:33 -0800 (PST) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-2d61b4ef6cdso162266877b3.11 for ; Tue, 22 Feb 2022 21:25:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=Dq9qt2AUp1awc/cLmelSzNKRNDnSx+0iG/3BD+f2Pqw=; b=bPNyVJMg6CFD9HMFXRNSQizMrYjSNrjLiLhbm2g1SK1dzDwAFw9IVrAGvhhX45jt5Q MB3m8YKlJbAP43meWzITJhoGXfxoAhkBewEF4x4QZqaowTmjzbchko9N97X/yQbRhrNA ImSjY75jArlh3P4WvJxt6CSq2eFG/pUb6dFuicj5exLngzk9JG3+oOBNZTaiEkVgP1d8 VW0rHsRZ6Lp5BNVpEKjEx5KJ5bw8GUJBEAXS5TOsXJl2K77U7oBnknsAPh1rXoSbln87 yaf/9ms1aZYCgCW1qLWuFDH6hT7XNDObPyRg/4cz2jm+Z6IkcAJZ7HSdd02kUIZheQ2Z Pwpg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=Dq9qt2AUp1awc/cLmelSzNKRNDnSx+0iG/3BD+f2Pqw=; b=4c65shb+6fUk60hKnK4VfaGpZI78lZw28WZeLhv9om6W+9S9bjMha2ZmdhR6UCYxNa O6a3zgToB8trFqxJCvnBshBik6dxpK2Kx1xOxHSu4Il11DDmNOMyem2IUI6L0e0WUl1l qCnZlmhygdKY5VcGLZrTesyrkD60tsu13xsSUxkfpdPkWk9RswzdqQIKo8TCw0Jh9CQS +Vfe7wUbCRwCuugGOXNnyLt5kpm5OcGqU8xIsDC5tkL/tGU21HEwz6ZKoilH01v9BRpG VXPgCRdj/aGdIEkZ8qWK6ItxpVmIvyhlV1shQ75Yp0x47yT/BpZgSW2kjT+UUjhYYNPB EGaA== X-Gm-Message-State: AOAM531Tyai8WyfQf6KM6ZQfZUw8z2i2rs/dmgHK7RShQXuDmkbpVe+W +oF53nIGkQ+MB8Rbs0LqQWYqWydM1s2cWgS6Ky/Ps4BHU36+iZxJrE869mPETO8W13P6CuNm1Rn lCillZG7UrZ3KgduDSioymb5+2a+aHqnjr/v8VbHv7x5X3Jstt1Z/nYnms4aTXKKznEn1NaUn X-Google-Smtp-Source: ABdhPJzaqkidQawY7QCK9X5rVK71lVLf3wgt7MJ0n5efK7otIJAGSETfWRZjLt8X/bYBHlNaQvv9LeMoFDFo X-Received: from js-desktop.svl.corp.google.com ([2620:15c:2cd:202:ccbe:5d15:e2e6:322]) (user=junaids job=sendgmr) by 2002:a0d:d1c5:0:b0:2ca:287c:6b81 with SMTP id t188-20020a0dd1c5000000b002ca287c6b81mr28230447ywd.38.1645593926233; Tue, 22 Feb 2022 21:25:26 -0800 (PST) Date: Tue, 22 Feb 2022 21:22:21 -0800 In-Reply-To: <20220223052223.1202152-1-junaids@google.com> Message-Id: <20220223052223.1202152-46-junaids@google.com> Mime-Version: 1.0 References: <20220223052223.1202152-1-junaids@google.com> X-Mailer: git-send-email 2.35.1.473.g83b2b277ed-goog Subject: [RFC PATCH 45/47] mm: asi: Mapping global nonsensitive areas in asi_global_init From: Junaid Shahid To: linux-kernel@vger.kernel.org Cc: Ofir Weisse , kvm@vger.kernel.org, pbonzini@redhat.com, jmattson@google.com, pjt@google.com, alexandre.chartre@oracle.com, rppt@linux.ibm.com, dave.hansen@linux.intel.com, peterz@infradead.org, tglx@linutronix.de, luto@kernel.org, linux-mm@kvack.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Ofir Weisse There are several areas in memory which we consider non sensitive. These areas should be mapped in every ASI domain. We map there areas in asi_global_init(). We modified some of the linking scripts to ensure these areas are starting and ending on page boundaries. The areas: - _stext --> _etext - __init_begin --> __init_end - __start_rodata --> __end_rodata - __start_once --> __end_once - __start___ex_table --> __stop___ex_table - __start_asi_nonsensitive --> __end_asi_nonsensitive - __start_asi_nonsensitive_readmostly --> __end_asi_nonsensitive_readmostly - __vvar_page --> + PAGE_SIZE - APIC_BASE --> + PAGE_SIZE - phys_base --> + PAGE_SIZE - __start___tracepoints_ptrs --> __stop___tracepoints_ptrs - __start___tracepoint_str --> __stop___tracepoint_str - __per_cpu_asi_start --> __per_cpu_asi_end (percpu) - irq_stack_backing_store --> + sizeof(irq_stack_backing_store) (percpu) The pgd's of the following addresses are cloned, modeled after KPTI: - CPU_ENTRY_AREA_BASE - ESPFIX_BASE_ADDR Signed-off-by: Ofir Weisse --- arch/x86/kernel/head_64.S | 12 +++++ arch/x86/kernel/vmlinux.lds.S | 2 +- arch/x86/mm/asi.c | 82 +++++++++++++++++++++++++++++++ include/asm-generic/vmlinux.lds.h | 13 +++-- 4 files changed, 105 insertions(+), 4 deletions(-) diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S index d8b3ebd2bb85..3d3874661895 100644 --- a/arch/x86/kernel/head_64.S +++ b/arch/x86/kernel/head_64.S @@ -574,9 +574,21 @@ SYM_DATA_LOCAL(early_gdt_descr_base, .quad INIT_PER_CP= U_VAR(gdt_page)) =20 .align 16 /* This must match the first entry in level2_kernel_pgt */ + +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION +/* TODO: Find a way to mark .section for phys_base */ +/* Ideally, we want to map phys_base in .data..asi_non_sensitive. That doe= sn't + * seem to work properly. For now, we just make sure phys_base is in it's = own + * page. */ + .align PAGE_SIZE +#endif SYM_DATA(phys_base, .quad 0x0) EXPORT_SYMBOL(phys_base) =20 +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + .align PAGE_SIZE +#endif + #include "../../x86/xen/xen-head.S" =20 __PAGE_ALIGNED_BSS diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S index 3d6dc12d198f..2b3668291785 100644 --- a/arch/x86/kernel/vmlinux.lds.S +++ b/arch/x86/kernel/vmlinux.lds.S @@ -148,8 +148,8 @@ SECTIONS } :text =3D0xcccc =20 /* End of text section, which should occupy whole number of pages */ - _etext =3D .; . =3D ALIGN(PAGE_SIZE); + _etext =3D .; =20 X86_ALIGN_RODATA_BEGIN RO_DATA(PAGE_SIZE) diff --git a/arch/x86/mm/asi.c b/arch/x86/mm/asi.c index 04628949e89d..7f2aa1823736 100644 --- a/arch/x86/mm/asi.c +++ b/arch/x86/mm/asi.c @@ -9,6 +9,7 @@ =20 #include #include +#include /* struct irq_stack */ #include =20 #include "mm_internal.h" @@ -17,6 +18,24 @@ #undef pr_fmt #define pr_fmt(fmt) "ASI: " fmt =20 +#include +#include + +extern struct exception_table_entry __start___ex_table[]; +extern struct exception_table_entry __stop___ex_table[]; + +extern const char __start_asi_nonsensitive[], __end_asi_nonsensitive[]; +extern const char __start_asi_nonsensitive_readmostly[], + __end_asi_nonsensitive_readmostly[]; +extern const char __per_cpu_asi_start[], __per_cpu_asi_end[]; +extern const char *__start___tracepoint_str[]; +extern const char *__stop___tracepoint_str[]; +extern const char *__start___tracepoints_ptrs[]; +extern const char *__stop___tracepoints_ptrs[]; +extern const char __vvar_page[]; + +DECLARE_PER_CPU_PAGE_ALIGNED(struct irq_stack, irq_stack_backing_store); + static struct asi_class asi_class[ASI_MAX_NUM] __asi_not_sensitive; static DEFINE_SPINLOCK(asi_class_lock __asi_not_sensitive); =20 @@ -412,6 +431,7 @@ void asi_unload_module(struct module* module) static int __init asi_global_init(void) { uint i, n; + int err =3D 0; =20 if (!boot_cpu_has(X86_FEATURE_ASI)) return 0; @@ -436,6 +456,68 @@ static int __init asi_global_init(void) =20 pcpu_map_asi_reserved_chunk(); =20 + + /* + * TODO: We need to ensure that all the sections mapped below are + * actually page-aligned by the linker. For now, we temporarily just + * align the start/end addresses here, but that is incorrect as the + * rest of the page could potentially contain sensitive data. + */ +#define MAP_SECTION(start, end) \ + pr_err("%s:%d mapping 0x%lx --> 0x%lx", \ + __FUNCTION__, __LINE__, start, end); \ + err =3D asi_map(ASI_GLOBAL_NONSENSITIVE, \ + (void*)((unsigned long)(start) & PAGE_MASK),\ + PAGE_ALIGN((unsigned long)(end)) - \ + ((unsigned long)(start) & PAGE_MASK)); \ + BUG_ON(err); + +#define MAP_SECTION_PERCPU(start, size) \ + pr_err("%s:%d mapping PERCPU 0x%lx --> 0x%lx", \ + __FUNCTION__, __LINE__, start, (unsigned long)start+size); \ + err =3D asi_map_percpu(ASI_GLOBAL_NONSENSITIVE, = \ + (void*)((unsigned long)(start) & PAGE_MASK), \ + PAGE_ALIGN((unsigned long)(size))); \ + BUG_ON(err); + + MAP_SECTION(_stext, _etext); + MAP_SECTION(__init_begin, __init_end); + MAP_SECTION(__start_rodata, __end_rodata); + MAP_SECTION(__start_once, __end_once); + MAP_SECTION(__start___ex_table, __stop___ex_table); + MAP_SECTION(__start_asi_nonsensitive, __end_asi_nonsensitive); + MAP_SECTION(__start_asi_nonsensitive_readmostly, + __end_asi_nonsensitive_readmostly); + MAP_SECTION(__vvar_page, __vvar_page + PAGE_SIZE); + MAP_SECTION(APIC_BASE, APIC_BASE + PAGE_SIZE); + MAP_SECTION(&phys_base, &phys_base + PAGE_SIZE); + + /* TODO: add a build flag to enable disable mapping only when + * instrumentation is used */ + MAP_SECTION(__start___tracepoints_ptrs, __stop___tracepoints_ptrs); + MAP_SECTION(__start___tracepoint_str, __stop___tracepoint_str); + + MAP_SECTION_PERCPU((void*)__per_cpu_asi_start, + __per_cpu_asi_end - __per_cpu_asi_start); + + MAP_SECTION_PERCPU(&irq_stack_backing_store, + sizeof(irq_stack_backing_store)); + + /* We have to map the stack canary into ASI. This is far from ideal, as + * attackers can use L1TF to steal the canary value, and then perhaps + * mount some other attack including a buffer overflow. This is a price + * we must pay to use ASI. + */ + MAP_SECTION_PERCPU(&fixed_percpu_data, PAGE_SIZE); + +#define CLONE_INIT_PGD(addr) \ + asi_clone_pgd(asi_global_nonsensitive_pgd, init_mm.pgd, addr); + + CLONE_INIT_PGD(CPU_ENTRY_AREA_BASE); +#ifdef CONFIG_X86_ESPFIX64 + CLONE_INIT_PGD(ESPFIX_BASE_ADDR); +#endif + return 0; } subsys_initcall(asi_global_init) diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinu= x.lds.h index 0a931aedc285..7152ce3613f5 100644 --- a/include/asm-generic/vmlinux.lds.h +++ b/include/asm-generic/vmlinux.lds.h @@ -235,8 +235,10 @@ #define TRACE_PRINTKS() __start___trace_bprintk_fmt =3D .; \ KEEP(*(__trace_printk_fmt)) /* Trace_printk fmt' pointer */ \ __stop___trace_bprintk_fmt =3D .; -#define TRACEPOINT_STR() __start___tracepoint_str =3D .; \ +#define TRACEPOINT_STR() . =3D ALIGN(PAGE_SIZE); \ + __start___tracepoint_str =3D .; \ KEEP(*(__tracepoint_str)) /* Trace_printk fmt' pointer */ \ + . =3D ALIGN(PAGE_SIZE); \ __stop___tracepoint_str =3D .; #else #define TRACE_PRINTKS() @@ -348,8 +350,10 @@ MEM_KEEP(init.data*) \ MEM_KEEP(exit.data*) \ *(.data.unlikely) \ + . =3D ALIGN(PAGE_SIZE); \ __start_once =3D .; \ *(.data.once) \ + . =3D ALIGN(PAGE_SIZE); \ __end_once =3D .; \ STRUCT_ALIGN(); \ *(__tracepoints) \ @@ -453,9 +457,10 @@ *(.rodata) *(.rodata.*) \ SCHED_DATA \ RO_AFTER_INIT_DATA /* Read only after init */ \ - . =3D ALIGN(8); \ + . =3D ALIGN(PAGE_SIZE); \ __start___tracepoints_ptrs =3D .; \ KEEP(*(__tracepoints_ptrs)) /* Tracepoints: pointer array */ \ + . =3D ALIGN(PAGE_SIZE); \ __stop___tracepoints_ptrs =3D .; \ *(__tracepoints_strings)/* Tracepoints: strings */ \ } \ @@ -671,11 +676,13 @@ */ #define EXCEPTION_TABLE(align) \ . =3D ALIGN(align); \ + . =3D ALIGN(PAGE_SIZE); \ __ex_table : AT(ADDR(__ex_table) - LOAD_OFFSET) { \ __start___ex_table =3D .; \ KEEP(*(__ex_table)) \ + . =3D ALIGN(PAGE_SIZE); \ __stop___ex_table =3D .; \ - } + } \ =20 /* * .BTF --=20 2.35.1.473.g83b2b277ed-goog From nobody Fri Jun 26 13:44:01 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id EADB9C433F5 for ; Wed, 23 Feb 2022 05:29:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237530AbiBWF32 (ORCPT ); Wed, 23 Feb 2022 00:29:28 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56740 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238478AbiBWF1V (ORCPT ); Wed, 23 Feb 2022 00:27:21 -0500 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9D5536FA00 for ; Tue, 22 Feb 2022 21:25:35 -0800 (PST) Received: by mail-yb1-xb49.google.com with SMTP id w1-20020a05690204e100b006244315a721so15103484ybs.0 for ; Tue, 22 Feb 2022 21:25:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=cWpgk2PD8ywov80MgvYy738g6oDVkPJMa4mg6Kd57WI=; b=EONmeIZhbgQ2Ix8S9q3J8G/HTiRJQe/uAmut/mIfTv5CIy/cqk4x/lLfaofIL6ZsPO 2DK7UMkGosmFYOPK8XbKYJ2B7I4QKSD7X8xoOad3cTZw5YniQxiy63dmB34fBM3ScXK9 b4M3PhJD3t0ZCy25IHGFNrU1Wai96D7oCn70Yeg5Az606EQRIrVrKV03yc49SjAoBqEY D/bmDYBLa30k0Q0T1/0E+zVvHZ7GWXBtnOemyeM6dCZ+R2nj996pZd5wNPpUZxVxnqxJ WaqOqmkCjUVkdvW9sUwO7Bbj6HXZvuS7EIm1omaeZny3PaSOUoJHhQmMblrL8LNsc6wd q1rg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=cWpgk2PD8ywov80MgvYy738g6oDVkPJMa4mg6Kd57WI=; b=ESNwcdSrhhzBzcM6W3/JrafRFhqX3vAFFQORC5ayZmuSAZcJsDDvmYfu765Us7vUVL rmbF2jH3u4QM9lTj7wBIIdpSn/p15J9GxQkOi54ylP4ryK4Jg9WHI4Bs6Q8CS2j6KjnH o389/Kb4kJxVEHAr9WG1vZvHuoJkBCScqZPMCvIdipgV/bsLFcqNRfzlj3SZH5tH7PHq xPb8BHm0mN6oQc8GoUjaZoDKBAW9u2j0DQQG94iSqSqX7apR2FPWM2vbdyjKDjENaA2R ATvw1enppYrHGPuPkyoJ+ylu85IVzZrC0/NY0WERBZuegu01wlRhTCo+UGLK6yj1YnPo fsog== X-Gm-Message-State: AOAM5308wCf0ZPaGpOIAQ7o+N+hT9jBEHFLiTLp4wW50Ta9MRa4GXOEp Vy9lOeFfTl6k9wqJLjCLeVxUf5RR0FDvHEVD9BAhgqEivp/VVxaKF2Wi4AZyPBbQLlTL+KOEKqB m09ycLQK2y8/yRYQB7rpip6Q4lja1L7IYx5Hm1uZhLLUzWjA3jTmMDYjDe1d4awh0mZrrJfnr X-Google-Smtp-Source: ABdhPJxkXSiWBnVLfkr470oqcQZfrXanhJKcgx/q87EPE8RXSdLDTEk6wPn7sTX4KYfxfNXBGCkwiM0xCmzr X-Received: from js-desktop.svl.corp.google.com ([2620:15c:2cd:202:ccbe:5d15:e2e6:322]) (user=junaids job=sendgmr) by 2002:a25:bf87:0:b0:622:1e66:e7fd with SMTP id l7-20020a25bf87000000b006221e66e7fdmr25540509ybk.341.1645593928498; Tue, 22 Feb 2022 21:25:28 -0800 (PST) Date: Tue, 22 Feb 2022 21:22:22 -0800 In-Reply-To: <20220223052223.1202152-1-junaids@google.com> Message-Id: <20220223052223.1202152-47-junaids@google.com> Mime-Version: 1.0 References: <20220223052223.1202152-1-junaids@google.com> X-Mailer: git-send-email 2.35.1.473.g83b2b277ed-goog Subject: [RFC PATCH 46/47] kvm: asi: Do asi_exit() in vcpu_run loop before returning to userspace From: Junaid Shahid To: linux-kernel@vger.kernel.org Cc: Ofir Weisse , kvm@vger.kernel.org, pbonzini@redhat.com, jmattson@google.com, pjt@google.com, alexandre.chartre@oracle.com, rppt@linux.ibm.com, dave.hansen@linux.intel.com, peterz@infradead.org, tglx@linutronix.de, luto@kernel.org, linux-mm@kvack.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Ofir Weisse For the time being, we switch to the full kernel address space before returning back to userspace. Once KPTI is also implemented using ASI, we could potentially also switch to the KPTI address space directly. Signed-off-by: Ofir Weisse --- arch/x86/kvm/x86.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 680725089a18..294f73e9e71e 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -10148,13 +10148,17 @@ static int vcpu_run(struct kvm_vcpu *vcpu) srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx); r =3D xfer_to_guest_mode_handle_work(vcpu); if (r) - return r; + goto exit; vcpu->srcu_idx =3D srcu_read_lock(&kvm->srcu); } } =20 srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx); =20 +exit: + /* TODO(oweisse): trace this exit if we're still within an ASI. */ + asi_exit(); + return r; } =20 --=20 2.35.1.473.g83b2b277ed-goog From nobody Fri Jun 26 13:44:01 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7DE59C433F5 for ; Wed, 23 Feb 2022 05:28:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238452AbiBWF3U (ORCPT ); Wed, 23 Feb 2022 00:29:20 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58046 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238483AbiBWF1W (ORCPT ); Wed, 23 Feb 2022 00:27:22 -0500 Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com [IPv6:2607:f8b0:4864:20::114a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 20E766AA79 for ; Tue, 22 Feb 2022 21:25:37 -0800 (PST) Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-2d6914a097cso161867247b3.10 for ; Tue, 22 Feb 2022 21:25:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=TaqdfVv20HSlc0A1c0PRFCOb1G6wQiL20chjexIyKqk=; b=rGs+4FxaDjpCpPLEg7H9QRppi6us6PQKtKoDaRWUcIklSnWA7hAYHi0ajBlLinkdGD RR60+N9t//ZgngiTNXXIeQGAWtu9hbb85MnwRsz6HZkNODh5q+Hvo26B+KyLDLD0wO72 Km+o4S/5IA7Nm5fHy35QBZ6gA5Jj2sdkDbtwen0aMy6pdcJNCDB87jD1BH+xf2J+Qqo1 uywNmgZooXGbPVFc7sIEcTr3h+ZPEiF9YjXU/CTb1IwHcqN0Ljdos6xBZ49r2iJJ80jO IU+HktHwWbxx7NyomQ4O+OP+COIay05VRI+prWu+WSkdgnQPlZ0X+27JswkXnsvP2ZOJ /p0g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=TaqdfVv20HSlc0A1c0PRFCOb1G6wQiL20chjexIyKqk=; b=kvZFm6d3TrgQrKgw5mmXk6Eb4nz5UBELd1Q6CpIyOwOfrHoxfOQ7WuzrPA/8srHQje yMucUtFZTAcawshnUdlIHoNUDGqdnTXKUs2aizilOSVY05XQLQiCQgUztJrbrz2w1+t/ qYdxr4AalyQiHdRmp5RdOHbMjSc1Pkv2jExCVdSUMlIaHqBUcqnUOxl3KjqN1KSsH5Zd 8vO/pBgzvDyFy6HwDvdkHNOQ8Z+Vx7+BZhJID/vj/Kfzkn/zCKeEXnCbc1RE7yeoO/et EC8Ix2wfIL/PZXZazc/4Wxoae5VhQoeiweBipBoB1PWnNp2r3aygmWOdx3HOGJ3prtNr B9MA== X-Gm-Message-State: AOAM530KyirmBn1QrqyZIzqFiiGEaczGBoO8UupZOOpae15DCpiq6+la Wk5GT9exS24JQoBr4MfioyuTOluRJqYdXB+67wkDCkA6YNsIdQkEYTxT4QbUUJTvhgrvz17Vfuq g5eMRJ/AFW6mYjePc8rTpEtShTQfugO0bRWTxIlEdT0h9mse5F6bY5DcC86fh/8tB+vBEEYGZ X-Google-Smtp-Source: ABdhPJwmqi34TRvmt135zAgWCIHRaV4KjaCBBqxLVhnQGHlMtw+LypDzffAe2O8Zb/1XiXdFw/2ftIFsbgRk X-Received: from js-desktop.svl.corp.google.com ([2620:15c:2cd:202:ccbe:5d15:e2e6:322]) (user=junaids job=sendgmr) by 2002:a25:6fc1:0:b0:624:43a0:c16c with SMTP id k184-20020a256fc1000000b0062443a0c16cmr21683087ybc.222.1645593930719; Tue, 22 Feb 2022 21:25:30 -0800 (PST) Date: Tue, 22 Feb 2022 21:22:23 -0800 In-Reply-To: <20220223052223.1202152-1-junaids@google.com> Message-Id: <20220223052223.1202152-48-junaids@google.com> Mime-Version: 1.0 References: <20220223052223.1202152-1-junaids@google.com> X-Mailer: git-send-email 2.35.1.473.g83b2b277ed-goog Subject: [RFC PATCH 47/47] mm: asi: Properly un/mapping task stack from ASI + tlb flush From: Junaid Shahid To: linux-kernel@vger.kernel.org Cc: Ofir Weisse , kvm@vger.kernel.org, pbonzini@redhat.com, jmattson@google.com, pjt@google.com, alexandre.chartre@oracle.com, rppt@linux.ibm.com, dave.hansen@linux.intel.com, peterz@infradead.org, tglx@linutronix.de, luto@kernel.org, linux-mm@kvack.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Ofir Weisse There are several locations where this is important. Especially since a task_struct might be reused, potentially with a different mm. 1. Map in vcpu_run() @ arch/x86/kvm/x86.c 1. Unmap in release_task_stack() @ kernel/fork.c 2. Unmap in do_exit() @ kernel/exit.c 3. Unmap in begin_new_exec() @ fs/exec.c Signed-off-by: Ofir Weisse --- arch/x86/include/asm/asi.h | 6 ++++ arch/x86/kvm/x86.c | 6 ++++ arch/x86/mm/asi.c | 59 ++++++++++++++++++++++++++++++++++++++ fs/exec.c | 7 ++++- include/asm-generic/asi.h | 16 +++++++++-- include/linux/sched.h | 5 ++++ kernel/exit.c | 2 +- kernel/fork.c | 22 +++++++++++++- 8 files changed, 118 insertions(+), 5 deletions(-) diff --git a/arch/x86/include/asm/asi.h b/arch/x86/include/asm/asi.h index 6148e65fb0c2..9d8f43981678 100644 --- a/arch/x86/include/asm/asi.h +++ b/arch/x86/include/asm/asi.h @@ -87,6 +87,12 @@ void asi_unmap_user(struct asi *asi, void *va, size_t le= n); int asi_fill_pgtbl_pool(struct asi_pgtbl_pool *pool, uint count, gfp_t fl= ags); void asi_clear_pgtbl_pool(struct asi_pgtbl_pool *pool); =20 +int asi_map_task_stack(struct task_struct *tsk, struct asi *asi); +void asi_unmap_task_stack(struct task_struct *tsk); +void asi_mark_pages_local_nonsensitive(struct page *pages, uint order, + struct mm_struct *mm); +void asi_clear_pages_local_nonsensitive(struct page *pages, uint order); + static inline void asi_init_pgtbl_pool(struct asi_pgtbl_pool *pool) { pool->pgtbl_list =3D NULL; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 294f73e9e71e..718104eefaed 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -10122,6 +10122,12 @@ static int vcpu_run(struct kvm_vcpu *vcpu) vcpu->srcu_idx =3D srcu_read_lock(&kvm->srcu); vcpu->arch.l1tf_flush_l1d =3D true; =20 + /* We must have current->stack mapped into asi. This function can be + * safely called many times, as it will only do the actual mapping once. = */ + r =3D asi_map_task_stack(current, vcpu->kvm->asi); + if (r !=3D 0) + return r; + for (;;) { if (kvm_vcpu_running(vcpu)) { r =3D vcpu_enter_guest(vcpu); diff --git a/arch/x86/mm/asi.c b/arch/x86/mm/asi.c index 7f2aa1823736..a86ac6644a57 100644 --- a/arch/x86/mm/asi.c +++ b/arch/x86/mm/asi.c @@ -1029,6 +1029,45 @@ void asi_unmap(struct asi *asi, void *addr, size_t l= en, bool flush_tlb) asi_flush_tlb_range(asi, addr, len); } =20 +int asi_map_task_stack(struct task_struct *tsk, struct asi *asi) +{ + int ret; + + /* If the stack is already mapped to asi - don't need to map it ag= ain. */ + if (tsk->asi_stack_mapped) + return 0; + + if (!tsk->mm) + return -EINVAL; + + /* If the stack was allocated via the page allocator, we assume the + * stack pages were marked with PageNonSensitive, therefore tsk->s= tack + * address is properly aliased. */ + ret =3D asi_map(ASI_LOCAL_NONSENSITIVE, tsk->stack, THREAD_SIZE); + if (!ret) { + tsk->asi_stack_mapped =3D asi; + asi_sync_mapping(asi, tsk->stack, THREAD_SIZE); + } + + return ret; +} + +void asi_unmap_task_stack(struct task_struct *tsk) +{ + /* No need to unmap if the stack was not mapped to begin with. */ + if (!tsk->asi_stack_mapped) + return; + + if (!tsk->mm) + return; + + asi_unmap(ASI_LOCAL_NONSENSITIVE, tsk->stack, THREAD_SIZE, + /* flush_tlb =3D */ true); + + tsk->asi_stack_mapped =3D NULL; +} + + void *asi_va(unsigned long pa) { struct page *page =3D pfn_to_page(PHYS_PFN(pa)); @@ -1336,3 +1375,23 @@ void asi_unmap_user(struct asi *asi, void *addr, siz= e_t len) } } EXPORT_SYMBOL_GPL(asi_unmap_user); + +void asi_mark_pages_local_nonsensitive(struct page *pages, uint order, + struct mm_struct *mm) +{ + uint i; + for (i =3D 0; i < (1 << order); i++) { + __SetPageLocalNonSensitive(pages + i); + pages[i].asi_mm =3D mm; + } +} + +void asi_clear_pages_local_nonsensitive(struct page *pages, uint order) +{ + uint i; + for (i =3D 0; i < (1 << order); i++) { + __ClearPageLocalNonSensitive(pages + i); + pages[i].asi_mm =3D NULL; + } +} + diff --git a/fs/exec.c b/fs/exec.c index 76f3b433e80d..fb9182cf3f33 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -69,6 +69,7 @@ #include #include #include +#include =20 #include #include "internal.h" @@ -1238,7 +1239,11 @@ int begin_new_exec(struct linux_binprm * bprm) struct task_struct *me =3D current; int retval; =20 - /* TODO: (oweisse) unmap the stack from ASI */ + /* The old mm is about to be released later on in exec_mmap. We are + * reusing the task, including its stack which was mapped to + * mm->asi_pgd[0]. We need to asi_unmap the stack, so the destruct= or of + * the mm won't complain on "lingering" asi mappings. */ + asi_unmap_task_stack(current); =20 /* Once we are committed compute the creds */ retval =3D bprm_creds_from_file(bprm); diff --git a/include/asm-generic/asi.h b/include/asm-generic/asi.h index 2763cb1a974c..6e9a261a2b9d 100644 --- a/include/asm-generic/asi.h +++ b/include/asm-generic/asi.h @@ -66,8 +66,13 @@ static inline struct asi *asi_get_target(void) { return = NULL; } =20 static inline struct asi *asi_get_current(void) { return NULL; } =20 -static inline -int asi_map_gfp(struct asi *asi, void *addr, size_t len, gfp_t gfp_flags) +static inline int asi_map_task_stack(struct task_struct *tsk, struct asi *= asi) +{ return 0; } + +static inline void asi_unmap_task_stack(struct task_struct *tsk) { } + +static inline int asi_map_gfp(struct asi *asi, void *addr, size_t len, + gfp_t gfp_flags) { return 0; } @@ -130,6 +135,13 @@ static inline int asi_load_module(struct module* modul= e) {return 0;} =20 static inline void asi_unload_module(struct module* module) { } =20 +static inline +void asi_mark_pages_local_nonsensitive(struct page *pages, uint order, + struct mm_struct *mm) { } + +static inline +void asi_clear_pages_local_nonsensitive(struct page *pages, uint order) { } + #endif /* !_ASSEMBLY_ */ =20 #endif /* !CONFIG_ADDRESS_SPACE_ISOLATION */ diff --git a/include/linux/sched.h b/include/linux/sched.h index 78c351e35fec..87ad45e52b19 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -67,6 +67,7 @@ struct sighand_struct; struct signal_struct; struct task_delay_info; struct task_group; +struct asi; =20 /* * Task state bitmask. NOTE! These bits are also @@ -1470,6 +1471,10 @@ struct task_struct { int mce_count; #endif =20 +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + struct asi *asi_stack_mapped; +#endif + #ifdef CONFIG_KRETPROBES struct llist_head kretprobe_instances; #endif diff --git a/kernel/exit.c b/kernel/exit.c index ab2749cf6887..f21cc21814d1 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -768,7 +768,7 @@ void __noreturn do_exit(long code) profile_task_exit(tsk); kcov_task_exit(tsk); =20 - /* TODO: (oweisse) unmap the stack from ASI */ + asi_unmap_task_stack(tsk); =20 coredump_task_exit(tsk); ptrace_event(PTRACE_EVENT_EXIT, code); diff --git a/kernel/fork.c b/kernel/fork.c index cb147a72372d..876fefc477cb 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -216,7 +216,6 @@ static int free_vm_stack_cache(unsigned int cpu) =20 static unsigned long *alloc_thread_stack_node(struct task_struct *tsk, int= node) { - /* TODO: (oweisse) Add annotation to map the stack into ASI */ #ifdef CONFIG_VMAP_STACK void *stack; int i; @@ -269,7 +268,16 @@ static unsigned long *alloc_thread_stack_node(struct t= ask_struct *tsk, int node) struct page *page =3D alloc_pages_node(node, THREADINFO_GFP, THREAD_SIZE_ORDER); =20 + /* When marking pages as PageLocalNonSesitive we set the page->mm = to be + * NULL. We must make sure the flag is cleared from the stack pages + * before free_pages is called. Otherwise, page->mm will be access= ed + * which will reuslt in NULL reference. page_address() below will = yield + * an aliased address after ASI_LOCAL_MAP, thanks to + * PageLocalNonSesitive flag. */ if (likely(page)) { + asi_mark_pages_local_nonsensitive(page, + THREAD_SIZE_ORDER, + NULL); tsk->stack =3D kasan_reset_tag(page_address(page)); return tsk->stack; } @@ -301,6 +309,14 @@ static inline void free_thread_stack(struct task_struc= t *tsk) } #endif =20 + /* We must clear the PageNonSensitive flag before calling free_pag= es(). + * Otherwise page->mm (which is NULL) will be accessed, in order to + * unmap the pages from ASI. Specifically for the stack, we assume= the + * pages were already unmapped from ASI before we got here, via + * asi_unmap_task_stack(). */ + asi_clear_pages_local_nonsensitive(virt_to_page(tsk->stack), + THREAD_SIZE_ORDER); + __free_pages(virt_to_page(tsk->stack), THREAD_SIZE_ORDER); } # else @@ -436,6 +452,7 @@ static void release_task_stack(struct task_struct *tsk) if (WARN_ON(READ_ONCE(tsk->__state) !=3D TASK_DEAD)) return; /* Better to leak the stack than to free prematurely */ =20 + asi_unmap_task_stack(tsk); account_kernel_stack(tsk, -1); free_thread_stack(tsk); tsk->stack =3D NULL; @@ -916,6 +933,9 @@ static struct task_struct *dup_task_struct(struct task_= struct *orig, int node) * functions again. */ tsk->stack =3D stack; +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + tsk->asi_stack_mapped =3D NULL; +#endif #ifdef CONFIG_VMAP_STACK tsk->stack_vm_area =3D stack_vm_area; #endif --=20 2.35.1.473.g83b2b277ed-goog