From nobody Sun Nov 24 02:30:03 2024 Received: from mail-yw1-f201.google.com (mail-yw1-f201.google.com [209.85.128.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4FA88217F3B for ; Thu, 7 Nov 2024 20:20:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731010843; cv=none; b=Cp0tYtxEr7nIlubG1Cho+imvYnytxfZ0kB+5BWKOp1buK/Lj3J86ry17UsR3XxTQ4QFbfjSbJUnU8gvFvFG314kbTGow5hsV5ISuTyJydvYgj37bSylkPdn1kqtNC6FTci8jE+0CSv9lqBL+zoQSOLpKxgpSqvLOu0jplYveUy4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731010843; c=relaxed/simple; bh=3RHDmjo9FYFguNvGJrJfXL7WH3Mn/RPW5EohHUKeqY8=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=Q4xHC1GQDJi71cLurGWZGE6VF1p4uoJG2ydpfqjNXnrF8rPGu4ukmvO1bs+C6w8WTkA1Ydq3HSewxdzX5jaMsyQItVIIOx26Amcb46u1b4eJcZmn20N7iVZOgAO7I1YXYoiom4I4sACBgLp01lUP3a6JKiRq/0Jvt2Vyta9iBPo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--yuzhao.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=wHQoezw9; arc=none smtp.client-ip=209.85.128.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--yuzhao.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="wHQoezw9" Received: by mail-yw1-f201.google.com with SMTP id 00721157ae682-6ea8a5e862eso17895817b3.0 for ; Thu, 07 Nov 2024 12:20:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1731010840; x=1731615640; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=1tthOirtAYAbVODGiUuOcdhR/O6us8nRui7ePy/4XFU=; b=wHQoezw9Cl4irMe8QCEq4t3frxAd6s3cb8qKAbiOOldr1nhd2jv9RJhJ7bdFyJw/S4 X4vxCkTjaj3OciE/pjAJu1zIBc6pnSlU4GE9N8mt9ZvXgN+cHbfjitzrmru07ZOwfwP+ a08RD8/O+RRic7iSQ8BicnN6S9OA33C7kLb+acbDflvDtdzzGj6ehnFA9cpjLOH/C1uy jWCEE8VqtJo7uPZyMYSVU1NgRCEc7j93n1fMfyFcgBUQT0+VMcP7TGx2bnArklX7hMMd F+e90J5dwaqNcPUN9ySkPsmPdOIkgZvTr2+8XAWCa6ux7Mo5ZC+rxjVmDFetaSL9U/3w M6ew== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731010840; x=1731615640; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=1tthOirtAYAbVODGiUuOcdhR/O6us8nRui7ePy/4XFU=; b=M3fgQWE4ThtMH8wb8HlM5fkGE5UjH0tfVaL/N6DNcRJSdgN7S+Lt1R3iKh3tBNfJu/ y0lyxr6TDHibAD0Nha6Mqltxb3CQ96DHRdNb2CB+ZTzyayDF+ccrxav0q5FEFUPRHKyx 5+otbjOQs+FpW/KHO12nDCyoafdomsUffvFsYsmF3hyloZH8f3CNIH7EU3Jsi0eC682c QVxEVWgKUYQeX1jDXE3Wi+yzDPz7zBdRRo+tAwcXok59WTjOQ+RvPvg1mAYZk3vqABIM QP2UJ0ax9nK8MIzHcYASefoPNbHQ+yF47FzUHvZZbjADWwSYWLlMCIdBfU/ilDQdf4au k98g== X-Forwarded-Encrypted: i=1; AJvYcCV/Q/mVN/mLJ1v2F17TVkKez7r1oGVqYLX74Okj4OqtL78xiZwvA8Y4Nobp9FbBtj+HkHH0YnPbmERp1Dg=@vger.kernel.org X-Gm-Message-State: AOJu0YzVkgm5bX4KiSYVRO4PvlEdo8MRlmuwGzEM1NsV8WZ/2q1bWNOM VSwZKz9M6kREnSHCh1B7VZNNhnnNQBpC1q/MKpBZptJaP+HXmaw6VOoduflo9q/8YJXAtAan+eG GkA== X-Google-Smtp-Source: AGHT+IGTqsn/Xtckt6+UQcvmaBMjWwhX3GjS38ldjfYHLBKaWt3NH7a5E1E88BJehCRrOBEtvF7Ru96C8rA= X-Received: from yuzhao2.bld.corp.google.com ([2a00:79e0:2e28:6:a4c0:c64f:6cdd:91f8]) (user=yuzhao job=sendgmr) by 2002:a05:690c:255:b0:6ea:4b3a:6703 with SMTP id 00721157ae682-6eadc16949bmr149247b3.5.1731010840283; Thu, 07 Nov 2024 12:20:40 -0800 (PST) Date: Thu, 7 Nov 2024 13:20:28 -0700 In-Reply-To: <20241107202033.2721681-1-yuzhao@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20241107202033.2721681-1-yuzhao@google.com> X-Mailer: git-send-email 2.47.0.277.g8800431eea-goog Message-ID: <20241107202033.2721681-2-yuzhao@google.com> Subject: [PATCH v2 1/6] mm/hugetlb_vmemmap: batch-update PTEs From: Yu Zhao To: Andrew Morton , Catalin Marinas , Marc Zyngier , Muchun Song , Thomas Gleixner , Will Deacon Cc: Douglas Anderson , Mark Rutland , Nanyong Sun , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Yu Zhao Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Convert vmemmap_remap_walk->remap_pte to ->remap_pte_range so that vmemmap remap walks can batch-update PTEs. The goal of this conversion is to allow architectures to implement their own optimizations if possible, e.g., only to stop remote CPUs once for each batch when updating vmemmap on arm64. It is not intended to change the remap workflow nor should it by itself have any side effects on performance. Signed-off-by: Yu Zhao --- mm/hugetlb_vmemmap.c | 163 ++++++++++++++++++++++++------------------- 1 file changed, 91 insertions(+), 72 deletions(-) diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c index 57b7f591eee8..46befab48d41 100644 --- a/mm/hugetlb_vmemmap.c +++ b/mm/hugetlb_vmemmap.c @@ -22,7 +22,7 @@ /** * struct vmemmap_remap_walk - walk vmemmap page table * - * @remap_pte: called for each lowest-level entry (PTE). + * @remap_pte_range: called on a range of PTEs. * @nr_walked: the number of walked pte. * @reuse_page: the page which is reused for the tail vmemmap pages. * @reuse_addr: the virtual address of the @reuse_page page. @@ -32,8 +32,8 @@ * operations. */ struct vmemmap_remap_walk { - void (*remap_pte)(pte_t *pte, unsigned long addr, - struct vmemmap_remap_walk *walk); + void (*remap_pte_range)(pte_t *pte, unsigned long start, + unsigned long end, struct vmemmap_remap_walk *walk); unsigned long nr_walked; struct page *reuse_page; unsigned long reuse_addr; @@ -101,10 +101,6 @@ static int vmemmap_pmd_entry(pmd_t *pmd, unsigned long= addr, struct page *head; struct vmemmap_remap_walk *vmemmap_walk =3D walk->private; =20 - /* Only splitting, not remapping the vmemmap pages. */ - if (!vmemmap_walk->remap_pte) - walk->action =3D ACTION_CONTINUE; - spin_lock(&init_mm.page_table_lock); head =3D pmd_leaf(*pmd) ? pmd_page(*pmd) : NULL; /* @@ -129,33 +125,36 @@ static int vmemmap_pmd_entry(pmd_t *pmd, unsigned lon= g addr, ret =3D -ENOTSUPP; } spin_unlock(&init_mm.page_table_lock); - if (!head || ret) + if (ret) return ret; =20 - return vmemmap_split_pmd(pmd, head, addr & PMD_MASK, vmemmap_walk); -} + if (head) { + ret =3D vmemmap_split_pmd(pmd, head, addr & PMD_MASK, vmemmap_walk); + if (ret) + return ret; + } =20 -static int vmemmap_pte_entry(pte_t *pte, unsigned long addr, - unsigned long next, struct mm_walk *walk) -{ - struct vmemmap_remap_walk *vmemmap_walk =3D walk->private; + if (vmemmap_walk->remap_pte_range) { + pte_t *pte =3D pte_offset_kernel(pmd, addr); =20 - /* - * The reuse_page is found 'first' in page table walking before - * starting remapping. - */ - if (!vmemmap_walk->reuse_page) - vmemmap_walk->reuse_page =3D pte_page(ptep_get(pte)); - else - vmemmap_walk->remap_pte(pte, addr, vmemmap_walk); - vmemmap_walk->nr_walked++; + vmemmap_walk->nr_walked +=3D (next - addr) / PAGE_SIZE; + /* + * The reuse_page is found 'first' in page table walking before + * starting remapping. + */ + if (!vmemmap_walk->reuse_page) { + vmemmap_walk->reuse_page =3D pte_page(ptep_get(pte)); + pte++; + addr +=3D PAGE_SIZE; + } + vmemmap_walk->remap_pte_range(pte, addr, next, vmemmap_walk); + } =20 return 0; } =20 static const struct mm_walk_ops vmemmap_remap_ops =3D { .pmd_entry =3D vmemmap_pmd_entry, - .pte_entry =3D vmemmap_pte_entry, }; =20 static int vmemmap_remap_range(unsigned long start, unsigned long end, @@ -172,7 +171,7 @@ static int vmemmap_remap_range(unsigned long start, uns= igned long end, if (ret) return ret; =20 - if (walk->remap_pte && !(walk->flags & VMEMMAP_REMAP_NO_TLB_FLUSH)) + if (walk->remap_pte_range && !(walk->flags & VMEMMAP_REMAP_NO_TLB_FLUSH)) flush_tlb_kernel_range(start, end); =20 return 0; @@ -204,33 +203,45 @@ static void free_vmemmap_page_list(struct list_head *= list) free_vmemmap_page(page); } =20 -static void vmemmap_remap_pte(pte_t *pte, unsigned long addr, - struct vmemmap_remap_walk *walk) +static void vmemmap_remap_pte_range(pte_t *pte, unsigned long start, unsig= ned long end, + struct vmemmap_remap_walk *walk) { - /* - * Remap the tail pages as read-only to catch illegal write operation - * to the tail pages. - */ - pgprot_t pgprot =3D PAGE_KERNEL_RO; - struct page *page =3D pte_page(ptep_get(pte)); - pte_t entry; - - /* Remapping the head page requires r/w */ - if (unlikely(addr =3D=3D walk->reuse_addr)) { - pgprot =3D PAGE_KERNEL; - list_del(&walk->reuse_page->lru); + int i; + struct page *page; + int nr_pages =3D (end - start) / PAGE_SIZE; =20 + for (i =3D 0; i < nr_pages; i++) { + page =3D pte_page(ptep_get(pte + i)); + + list_add(&page->lru, walk->vmemmap_pages); + } + + page =3D walk->reuse_page; + + if (start =3D=3D walk->reuse_addr) { + list_del(&page->lru); + copy_page(page_to_virt(page), (void *)walk->reuse_addr); /* - * Makes sure that preceding stores to the page contents from - * vmemmap_remap_free() become visible before the set_pte_at() - * write. + * Makes sure that preceding stores to the page contents become + * visible before set_pte_at(). */ smp_wmb(); } =20 - entry =3D mk_pte(walk->reuse_page, pgprot); - list_add(&page->lru, walk->vmemmap_pages); - set_pte_at(&init_mm, addr, pte, entry); + for (i =3D 0; i < nr_pages; i++) { + pte_t val; + + /* + * The head page must be mapped read-write; the tail pages are + * mapped read-only to catch illegal modifications. + */ + if (!i && start =3D=3D walk->reuse_addr) + val =3D mk_pte(page, PAGE_KERNEL); + else + val =3D mk_pte(page, PAGE_KERNEL_RO); + + set_pte_at(&init_mm, start + PAGE_SIZE * i, pte + i, val); + } } =20 /* @@ -252,27 +263,39 @@ static inline void reset_struct_pages(struct page *st= art) memcpy(start, from, sizeof(*from) * NR_RESET_STRUCT_PAGE); } =20 -static void vmemmap_restore_pte(pte_t *pte, unsigned long addr, - struct vmemmap_remap_walk *walk) +static void vmemmap_restore_pte_range(pte_t *pte, unsigned long start, uns= igned long end, + struct vmemmap_remap_walk *walk) { - pgprot_t pgprot =3D PAGE_KERNEL; + int i; struct page *page; - void *to; - - BUG_ON(pte_page(ptep_get(pte)) !=3D walk->reuse_page); + int nr_pages =3D (end - start) / PAGE_SIZE; =20 page =3D list_first_entry(walk->vmemmap_pages, struct page, lru); - list_del(&page->lru); - to =3D page_to_virt(page); - copy_page(to, (void *)walk->reuse_addr); - reset_struct_pages(to); + + for (i =3D 0; i < nr_pages; i++) { + BUG_ON(pte_page(ptep_get(pte + i)) !=3D walk->reuse_page); + + copy_page(page_to_virt(page), (void *)walk->reuse_addr); + reset_struct_pages(page_to_virt(page)); + + page =3D list_next_entry(page, lru); + } =20 /* * Makes sure that preceding stores to the page contents become visible - * before the set_pte_at() write. + * before set_pte_at(). */ smp_wmb(); - set_pte_at(&init_mm, addr, pte, mk_pte(page, pgprot)); + + for (i =3D 0; i < nr_pages; i++) { + pte_t val; + + page =3D list_first_entry(walk->vmemmap_pages, struct page, lru); + list_del(&page->lru); + + val =3D mk_pte(page, PAGE_KERNEL); + set_pte_at(&init_mm, start + PAGE_SIZE * i, pte + i, val); + } } =20 /** @@ -290,7 +313,6 @@ static int vmemmap_remap_split(unsigned long start, uns= igned long end, unsigned long reuse) { struct vmemmap_remap_walk walk =3D { - .remap_pte =3D NULL, .flags =3D VMEMMAP_SPLIT_NO_TLB_FLUSH, }; =20 @@ -322,10 +344,10 @@ static int vmemmap_remap_free(unsigned long start, un= signed long end, { int ret; struct vmemmap_remap_walk walk =3D { - .remap_pte =3D vmemmap_remap_pte, - .reuse_addr =3D reuse, - .vmemmap_pages =3D vmemmap_pages, - .flags =3D flags, + .remap_pte_range =3D vmemmap_remap_pte_range, + .reuse_addr =3D reuse, + .vmemmap_pages =3D vmemmap_pages, + .flags =3D flags, }; int nid =3D page_to_nid((struct page *)reuse); gfp_t gfp_mask =3D GFP_KERNEL | __GFP_NORETRY | __GFP_NOWARN; @@ -340,8 +362,6 @@ static int vmemmap_remap_free(unsigned long start, unsi= gned long end, */ walk.reuse_page =3D alloc_pages_node(nid, gfp_mask, 0); if (walk.reuse_page) { - copy_page(page_to_virt(walk.reuse_page), - (void *)walk.reuse_addr); list_add(&walk.reuse_page->lru, vmemmap_pages); memmap_pages_add(1); } @@ -371,10 +391,9 @@ static int vmemmap_remap_free(unsigned long start, uns= igned long end, * They will be restored in the following call. */ walk =3D (struct vmemmap_remap_walk) { - .remap_pte =3D vmemmap_restore_pte, - .reuse_addr =3D reuse, - .vmemmap_pages =3D vmemmap_pages, - .flags =3D 0, + .remap_pte_range =3D vmemmap_restore_pte_range, + .reuse_addr =3D reuse, + .vmemmap_pages =3D vmemmap_pages, }; =20 vmemmap_remap_range(reuse, end, &walk); @@ -425,10 +444,10 @@ static int vmemmap_remap_alloc(unsigned long start, u= nsigned long end, { LIST_HEAD(vmemmap_pages); struct vmemmap_remap_walk walk =3D { - .remap_pte =3D vmemmap_restore_pte, - .reuse_addr =3D reuse, - .vmemmap_pages =3D &vmemmap_pages, - .flags =3D flags, + .remap_pte_range =3D vmemmap_restore_pte_range, + .reuse_addr =3D reuse, + .vmemmap_pages =3D &vmemmap_pages, + .flags =3D flags, }; =20 /* See the comment in the vmemmap_remap_free(). */ --=20 2.47.0.277.g8800431eea-goog From nobody Sun Nov 24 02:30:03 2024 Received: from mail-yw1-f201.google.com (mail-yw1-f201.google.com [209.85.128.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9E249218326 for ; Thu, 7 Nov 2024 20:20:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731010845; cv=none; b=ZROWF9IkLLxJR8rrN794UGAiTY6WiJh8mBN6ki4q4cqBT+0aRF/CuACvspTsOejVXraVqumlIPWBydttQp5fkyyKJ+/LMjMKd00ot2jCA65WHyDVEWV0GMB4FyUYSaS/QGdi62uFg5YYnJwQjHzD9ZivDbDUpsTQjgW6tW/CJss= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731010845; c=relaxed/simple; bh=72kVWP46gSNqQOf/QJtQ1sQ9k54LGtfxfxaT/2Kry8M=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=aVlPP9tZLF4pFiGWh8mVkpnV7tLaxA4qVfR1RZwGk4EdmQ+F0w7elvcJD84nDPI9TQWRLsUoCIRw5YVDsT6NHCjE4kaBcRV62lBeS9vnewveAdqRDUZI9YXP49Nn+SvAftcsGdBwCDCKUbg2fAG5/a0xPHnExie9piN4uIPChCY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--yuzhao.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=0NmtLqNn; arc=none smtp.client-ip=209.85.128.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--yuzhao.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="0NmtLqNn" Received: by mail-yw1-f201.google.com with SMTP id 00721157ae682-6e376aa4586so26245057b3.1 for ; Thu, 07 Nov 2024 12:20:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1731010843; x=1731615643; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=Oo5C9e0+cl8r7hE7UWmRxwqyWt2dE16i/xTv+Vqu/GQ=; b=0NmtLqNnnjqIJrTSQ2NUdzqYvKOkLuEkQdnvJMxg6RsZMoV6oPCPhJW4M9OX0yAXDe 5z2bj+2VyVEaoLs7Nbhb3ImGs2hMwcW0V2aYLulNszgNQaGPHoRs84LhYChgFhjQRqXn PMZ0btiUyXEOlaY6KwsllGpvzxzlZlGiQcJZVyB5gB2ao1hLtAUCCUSsjQyaX5doEr4j zHghJyfK8qfB32QqSTNO4L36SrB9+AJwyFU63yumMRV1ETJDtlfnICc1Me9eoR7ZotOv GSU8fCkqSeaOCG5UYZj8FondaNACvv2O+9Cw8KhJB+01YAClC4hLw50jIqO/j/yFT9Hd jVrw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731010843; x=1731615643; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Oo5C9e0+cl8r7hE7UWmRxwqyWt2dE16i/xTv+Vqu/GQ=; b=gyZmRJZhD+LSzKzMCNn5ChDAZUakDjW7XArNFK09sjaOsF84yZhXJmjRks6bnLemx1 7WYbRNituBuPVMZd98aVRA85Y7JOuuu7aa3r+j238zuLLynR9zy9gO6I3WDa3LMjP8IX Euz6UEY88FdiSLotZqe8wb0DYoIaMc3CosMIk0aIGEmHMfc6JzxwfshghGTLwLvFB85r ++smp16y31wzoDfYU+suhI7ra3/Xkc66XByGO0aKqAW/musDDOePREuU0jKAHaWNybX5 vN0k1ncjNa0+om9Qpg0PbujtI2fMbX/jSHrF4Oc18xq94Ceyj5QudAFwjWoLO+Gjdzkd dyrg== X-Forwarded-Encrypted: i=1; AJvYcCWymOmGuTqCpOPK4Jbiug+5vsGzXBTEjL0s1KXWu6aDfpl+YBzbPLAX/tU+ROzZTI7wGRoJ8I41Ltewld4=@vger.kernel.org X-Gm-Message-State: AOJu0YwyClQPqpqjtXGSX+Abr0/mVTG3Xq+7R68m15BTv/FCCLVYCTWO et52xiws5caXJ3DPrtI6uA9XgMzvCNMFdeVrm2BYJwrbr4T4REOBB9JdfbYbQT8RuKDlA+I6wXo qbw== X-Google-Smtp-Source: AGHT+IFItECqZ2dxgSyANVkerUl0Fg/UFqJy0LSXkY8wHLj1uLzeJdS7jFKPfch4UM5ukL5EJb9In+OVPNo= X-Received: from yuzhao2.bld.corp.google.com ([2a00:79e0:2e28:6:a4c0:c64f:6cdd:91f8]) (user=yuzhao job=sendgmr) by 2002:a05:690c:4a13:b0:6e3:b93:3ae2 with SMTP id 00721157ae682-6eaddd704d6mr14027b3.1.1731010842758; Thu, 07 Nov 2024 12:20:42 -0800 (PST) Date: Thu, 7 Nov 2024 13:20:29 -0700 In-Reply-To: <20241107202033.2721681-1-yuzhao@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20241107202033.2721681-1-yuzhao@google.com> X-Mailer: git-send-email 2.47.0.277.g8800431eea-goog Message-ID: <20241107202033.2721681-3-yuzhao@google.com> Subject: [PATCH v2 2/6] mm/hugetlb_vmemmap: add arch-independent helpers From: Yu Zhao To: Andrew Morton , Catalin Marinas , Marc Zyngier , Muchun Song , Thomas Gleixner , Will Deacon Cc: Douglas Anderson , Mark Rutland , Nanyong Sun , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Yu Zhao Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add architecture-independent helpers to allow individual architectures to work around their own limitations when updating vmemmap. Specifically, the current remap workflow requires break-before-make (BBM) on arm64. By overriding the default helpers later in this series, arm64 will be able to support the current HVO implementation. Signed-off-by: Yu Zhao --- include/linux/mm_types.h | 7 +++ mm/hugetlb_vmemmap.c | 99 ++++++++++++++++++++++++++++++++++------ 2 files changed, 92 insertions(+), 14 deletions(-) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 6e3bdf8e38bc..0f3ae6e173f6 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -1499,4 +1499,11 @@ enum { /* See also internal only FOLL flags in mm/internal.h */ }; =20 +/* Skip the TLB flush when we split the PMD */ +#define VMEMMAP_SPLIT_NO_TLB_FLUSH BIT(0) +/* Skip the TLB flush when we remap the PTE */ +#define VMEMMAP_REMAP_NO_TLB_FLUSH BIT(1) +/* synchronize_rcu() to avoid writes from page_ref_add_unless() */ +#define VMEMMAP_SYNCHRONIZE_RCU BIT(2) + #endif /* _LINUX_MM_TYPES_H */ diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c index 46befab48d41..e50a196399f5 100644 --- a/mm/hugetlb_vmemmap.c +++ b/mm/hugetlb_vmemmap.c @@ -38,16 +38,56 @@ struct vmemmap_remap_walk { struct page *reuse_page; unsigned long reuse_addr; struct list_head *vmemmap_pages; - -/* Skip the TLB flush when we split the PMD */ -#define VMEMMAP_SPLIT_NO_TLB_FLUSH BIT(0) -/* Skip the TLB flush when we remap the PTE */ -#define VMEMMAP_REMAP_NO_TLB_FLUSH BIT(1) -/* synchronize_rcu() to avoid writes from page_ref_add_unless() */ -#define VMEMMAP_SYNCHRONIZE_RCU BIT(2) unsigned long flags; }; =20 +#ifndef VMEMMAP_ARCH_TLB_FLUSH_FLAGS +#define VMEMMAP_ARCH_TLB_FLUSH_FLAGS 0 +#endif + +#ifndef vmemmap_update_supported +static bool vmemmap_update_supported(void) +{ + return true; +} +#endif + +#ifndef vmemmap_update_lock +static void vmemmap_update_lock(void) +{ +} +#endif + +#ifndef vmemmap_update_unlock +static void vmemmap_update_unlock(void) +{ +} +#endif + +#ifndef vmemmap_update_pte_range_start +static void vmemmap_update_pte_range_start(pte_t *pte, unsigned long start= , unsigned long end) +{ +} +#endif + +#ifndef vmemmap_update_pte_range_end +static void vmemmap_update_pte_range_end(void) +{ +} +#endif + +#ifndef vmemmap_update_pmd_range_start +static void vmemmap_update_pmd_range_start(pmd_t *pmd, unsigned long start= , unsigned long end) +{ +} +#endif + +#ifndef vmemmap_update_pmd_range_end +static void vmemmap_update_pmd_range_end(void) +{ +} +#endif + static int vmemmap_split_pmd(pmd_t *pmd, struct page *head, unsigned long = start, struct vmemmap_remap_walk *walk) { @@ -83,7 +123,9 @@ static int vmemmap_split_pmd(pmd_t *pmd, struct page *he= ad, unsigned long start, =20 /* Make pte visible before pmd. See comment in pmd_install(). */ smp_wmb(); + vmemmap_update_pmd_range_start(pmd, start, start + PMD_SIZE); pmd_populate_kernel(&init_mm, pmd, pgtable); + vmemmap_update_pmd_range_end(); if (!(walk->flags & VMEMMAP_SPLIT_NO_TLB_FLUSH)) flush_tlb_kernel_range(start, start + PMD_SIZE); } else { @@ -164,10 +206,12 @@ static int vmemmap_remap_range(unsigned long start, u= nsigned long end, =20 VM_BUG_ON(!PAGE_ALIGNED(start | end)); =20 + vmemmap_update_lock(); mmap_read_lock(&init_mm); ret =3D walk_page_range_novma(&init_mm, start, end, &vmemmap_remap_ops, NULL, walk); mmap_read_unlock(&init_mm); + vmemmap_update_unlock(); if (ret) return ret; =20 @@ -228,6 +272,8 @@ static void vmemmap_remap_pte_range(pte_t *pte, unsigne= d long start, unsigned lo smp_wmb(); } =20 + vmemmap_update_pte_range_start(pte, start, end); + for (i =3D 0; i < nr_pages; i++) { pte_t val; =20 @@ -242,6 +288,8 @@ static void vmemmap_remap_pte_range(pte_t *pte, unsigne= d long start, unsigned lo =20 set_pte_at(&init_mm, start + PAGE_SIZE * i, pte + i, val); } + + vmemmap_update_pte_range_end(); } =20 /* @@ -287,6 +335,8 @@ static void vmemmap_restore_pte_range(pte_t *pte, unsig= ned long start, unsigned */ smp_wmb(); =20 + vmemmap_update_pte_range_start(pte, start, end); + for (i =3D 0; i < nr_pages; i++) { pte_t val; =20 @@ -296,6 +346,8 @@ static void vmemmap_restore_pte_range(pte_t *pte, unsig= ned long start, unsigned val =3D mk_pte(page, PAGE_KERNEL); set_pte_at(&init_mm, start + PAGE_SIZE * i, pte + i, val); } + + vmemmap_update_pte_range_end(); } =20 /** @@ -513,7 +565,8 @@ static int __hugetlb_vmemmap_restore_folio(const struct= hstate *h, */ int hugetlb_vmemmap_restore_folio(const struct hstate *h, struct folio *fo= lio) { - return __hugetlb_vmemmap_restore_folio(h, folio, VMEMMAP_SYNCHRONIZE_RCU); + return __hugetlb_vmemmap_restore_folio(h, folio, + VMEMMAP_SYNCHRONIZE_RCU | VMEMMAP_ARCH_TLB_FLUSH_FLAGS); } =20 /** @@ -553,7 +606,7 @@ long hugetlb_vmemmap_restore_folios(const struct hstate= *h, list_move(&folio->lru, non_hvo_folios); } =20 - if (restored) + if (restored && !(VMEMMAP_ARCH_TLB_FLUSH_FLAGS & VMEMMAP_REMAP_NO_TLB_FLU= SH)) flush_tlb_all(); if (!ret) ret =3D restored; @@ -641,7 +694,8 @@ void hugetlb_vmemmap_optimize_folio(const struct hstate= *h, struct folio *folio) { LIST_HEAD(vmemmap_pages); =20 - __hugetlb_vmemmap_optimize_folio(h, folio, &vmemmap_pages, VMEMMAP_SYNCHR= ONIZE_RCU); + __hugetlb_vmemmap_optimize_folio(h, folio, &vmemmap_pages, + VMEMMAP_SYNCHRONIZE_RCU | VMEMMAP_ARCH_TLB_FLUSH_FLAGS); free_vmemmap_page_list(&vmemmap_pages); } =20 @@ -683,7 +737,8 @@ void hugetlb_vmemmap_optimize_folios(struct hstate *h, = struct list_head *folio_l break; } =20 - flush_tlb_all(); + if (!(VMEMMAP_ARCH_TLB_FLUSH_FLAGS & VMEMMAP_SPLIT_NO_TLB_FLUSH)) + flush_tlb_all(); =20 list_for_each_entry(folio, folio_list, lru) { int ret; @@ -701,24 +756,35 @@ void hugetlb_vmemmap_optimize_folios(struct hstate *h= , struct list_head *folio_l * allowing more vmemmap remaps to occur. */ if (ret =3D=3D -ENOMEM && !list_empty(&vmemmap_pages)) { - flush_tlb_all(); + if (!(VMEMMAP_ARCH_TLB_FLUSH_FLAGS & VMEMMAP_REMAP_NO_TLB_FLUSH)) + flush_tlb_all(); free_vmemmap_page_list(&vmemmap_pages); INIT_LIST_HEAD(&vmemmap_pages); __hugetlb_vmemmap_optimize_folio(h, folio, &vmemmap_pages, flags); } } =20 - flush_tlb_all(); + if (!(VMEMMAP_ARCH_TLB_FLUSH_FLAGS & VMEMMAP_REMAP_NO_TLB_FLUSH)) + flush_tlb_all(); free_vmemmap_page_list(&vmemmap_pages); } =20 +static int hugetlb_vmemmap_sysctl(const struct ctl_table *ctl, int write, + void *buffer, size_t *lenp, loff_t *ppos) +{ + if (!vmemmap_update_supported()) + return -ENODEV; + + return proc_dobool(ctl, write, buffer, lenp, ppos); +} + static struct ctl_table hugetlb_vmemmap_sysctls[] =3D { { .procname =3D "hugetlb_optimize_vmemmap", .data =3D &vmemmap_optimize_enabled, .maxlen =3D sizeof(vmemmap_optimize_enabled), .mode =3D 0644, - .proc_handler =3D proc_dobool, + .proc_handler =3D hugetlb_vmemmap_sysctl, }, }; =20 @@ -729,6 +795,11 @@ static int __init hugetlb_vmemmap_init(void) /* HUGETLB_VMEMMAP_RESERVE_SIZE should cover all used struct pages */ BUILD_BUG_ON(__NR_USED_SUBPAGE > HUGETLB_VMEMMAP_RESERVE_PAGES); =20 + if (READ_ONCE(vmemmap_optimize_enabled) && !vmemmap_update_supported()) { + pr_warn("HugeTLB: disabling HVO due to missing support.\n"); + WRITE_ONCE(vmemmap_optimize_enabled, false); + } + for_each_hstate(h) { if (hugetlb_vmemmap_optimizable(h)) { register_sysctl_init("vm", hugetlb_vmemmap_sysctls); --=20 2.47.0.277.g8800431eea-goog From nobody Sun Nov 24 02:30:03 2024 Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 17A31218584 for ; Thu, 7 Nov 2024 20:20:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731010847; cv=none; b=aQOd7vREqZ8+1VDrCFKh8irF1e2j+PVXo/Cup++BR1IIeG0WMHfBfhj0j+3ZQfpje0RyFZqjX/5Bkj+dZJQdXhOTbjdCIWEOdQgL7zjGVufxqIFwMZSkayFn1T34K3yYM+wWZHNHlfiTdQJHZGHi5vEgxzeBIPWm/sETSpUvc7o= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731010847; c=relaxed/simple; bh=U5hWtQ7GpNtk7DTjtDQFFiL8/QWpDr8Gl41jRC3L+Jc=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=FzoxTV7hwmeSP01ZyiFQYhaT0P39CUWM49kxoXhS0iOloUpyJnuobSqAPsTY2m2WaO1nfLc1iCk6R0f+Czv7HPGB4m4enKSmw8joEncw2BKeDeGhJqIQUB4cn1zdq6NmT/uUbEAJnZmRZhRPa/M7UTKKqwcS6N/qoqyAKdSblBY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--yuzhao.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=HxFQYvX6; arc=none smtp.client-ip=209.85.219.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--yuzhao.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="HxFQYvX6" Received: by mail-yb1-f201.google.com with SMTP id 3f1490d57ef6-e33152c8225so2940973276.0 for ; Thu, 07 Nov 2024 12:20:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1731010845; x=1731615645; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=v35HyYfzjB2bG4AOilBZdKULCbIxbvNUBJLvIql+6B0=; b=HxFQYvX6i5Bs+tCBM0JDGtVLLdNzBS5W1SkyCPyRm/1EOVVJLm3HWj4eRxXXzlCz5Y CFlVGy2RcZgRYQ6cDGJ39BvNpwfDrR3jgoPpU532QgALbE0WCzwfoMY9opC8jv2iSwDF 8LMDmJ4LOZipfSfJyeKltlXg29OwI/MLqylWJE0GptCUNlO12UNJp2REkd8popdkuwMQ poAWFYeQcSyCk028sdWkgxpZ39Sj23FEvgcm0qQ8zLsFw8cy7Ee0E0BfFYA1i3602/Yh c8gfUsQfHPXK3dmvNXrdfvT9yKDyGUZ7VwO1J3ZAaQ4jc9TLDyaBFblX8Wg/dAfWcjpH vPag== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731010845; x=1731615645; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=v35HyYfzjB2bG4AOilBZdKULCbIxbvNUBJLvIql+6B0=; b=KoRIPnox1qxt3H1jhsG0p9aR8AE0PCclvioY2IVhKZHnxRvXSrGGL03xlX0EXwoztG 1eqfzxV9E7XFgjRW3bxFkIRSUKTomp8d4IFNpewe9FBFGswhIzm0DbzckecP/hJCXNVv CcMM/nXi0jsbD9DwgD8xRwLbO1Q2qxfQ7/1Ap6Vs1HxWZoIbbnMnVFrYuKwDSF72Qdr9 j/qvGBrG4mD92yo6lHlTEAOtDptYIXxlKsxlKxhEWHFRgJ+k582Vam0mPsatcEIZfpEF 0vTw9w+C+wMqJJjURu7J9/zOvejrbExfPn8X5+ioUciuW4EsZ70wTuGhP31vAJjlb0MC 3tYg== X-Forwarded-Encrypted: i=1; AJvYcCUs91FT4Ix9+xkX7WXzoBU/HfJ09RYTKQMKDpKeMdwaOYkoMtxYpjo+ec3Mo+SI+QX8Uq6Ewd/8zgyh4pM=@vger.kernel.org X-Gm-Message-State: AOJu0YzuJVFFjb5mZuuAV5rBW2xtYDjD5KCCaMLZz32HyRFnd3iFzbVe s3aE4NxHb7nZj66Rr8SzX30DluV/oQmrV2zVA42z/PZjJ3LSmKUV5kl1sZ1hQBD48dN9xqQu9yh IJQ== X-Google-Smtp-Source: AGHT+IH5+g5PrNVgp5Pr5OTYOBdUbFjFulnS4uSn6ZxFeBhddvMAobpkpn0NbUnR+mioP6ARIrKUFKFKqUg= X-Received: from yuzhao2.bld.corp.google.com ([2a00:79e0:2e28:6:a4c0:c64f:6cdd:91f8]) (user=yuzhao job=sendgmr) by 2002:a25:a249:0:b0:e28:e510:6ab1 with SMTP id 3f1490d57ef6-e337f8faff5mr210276.8.1731010845128; Thu, 07 Nov 2024 12:20:45 -0800 (PST) Date: Thu, 7 Nov 2024 13:20:30 -0700 In-Reply-To: <20241107202033.2721681-1-yuzhao@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20241107202033.2721681-1-yuzhao@google.com> X-Mailer: git-send-email 2.47.0.277.g8800431eea-goog Message-ID: <20241107202033.2721681-4-yuzhao@google.com> Subject: [PATCH v2 3/6] irqchip/gic-v3: support SGI broadcast From: Yu Zhao To: Andrew Morton , Catalin Marinas , Marc Zyngier , Muchun Song , Thomas Gleixner , Will Deacon Cc: Douglas Anderson , Mark Rutland , Nanyong Sun , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Yu Zhao Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" GIC v3 and later support SGI broadcast, i.e., the mode that routes interrupts to all PEs in the system excluding the local CPU. Supporting this mode can avoid looping through all the remote CPUs when broadcasting SGIs, especially for systems with 200+ CPUs. The performance improvement can be measured with the rest of this series booted with "hugetlb_free_vmemmap=3Don irqchip.gicv3_pseudo_nmi=3D1": cd /sys/kernel/mm/hugepages/ echo 600 >hugepages-1048576kB/nr_hugepages echo 2048kB >hugepages-1048576kB/demote_size perf record -g time echo 600 >hugepages-1048576kB/demote" With 80 CPUs: gic_ipi_send_mask() bash sys time Before: 38.14% 0m10.513s After: 0.20% 0m5.132s Signed-off-by: Yu Zhao --- drivers/irqchip/irq-gic-v3.c | 31 ++++++++++++++++++++++++++++--- 1 file changed, 28 insertions(+), 3 deletions(-) diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c index ce87205e3e82..7ebe870e4608 100644 --- a/drivers/irqchip/irq-gic-v3.c +++ b/drivers/irqchip/irq-gic-v3.c @@ -1322,6 +1322,7 @@ static void gic_cpu_init(void) =20 #define MPIDR_TO_SGI_RS(mpidr) (MPIDR_RS(mpidr) << ICC_SGI1R_RS_SHIFT) #define MPIDR_TO_SGI_CLUSTER_ID(mpidr) ((mpidr) & ~0xFUL) +#define MPIDR_TO_SGI_TARGET_LIST(mpidr) (1 << ((mpidr) & 0xf)) =20 /* * gic_starting_cpu() is called after the last point where cpuhp is allowed @@ -1356,7 +1357,7 @@ static u16 gic_compute_target_list(int *base_cpu, con= st struct cpumask *mask, mpidr =3D gic_cpu_to_affinity(cpu); =20 while (cpu < nr_cpu_ids) { - tlist |=3D 1 << (mpidr & 0xf); + tlist |=3D MPIDR_TO_SGI_TARGET_LIST(mpidr); =20 next_cpu =3D cpumask_next(cpu, mask); if (next_cpu >=3D nr_cpu_ids) @@ -1394,9 +1395,20 @@ static void gic_send_sgi(u64 cluster_id, u16 tlist, = unsigned int irq) gic_write_sgi1r(val); } =20 +static void gic_broadcast_sgi(unsigned int irq) +{ + u64 val; + + val =3D BIT_ULL(ICC_SGI1R_IRQ_ROUTING_MODE_BIT) | (irq << ICC_SGI1R_SGI_I= D_SHIFT); + + pr_devel("CPU %d: broadcasting SGI %u\n", smp_processor_id(), irq); + gic_write_sgi1r(val); +} + static void gic_ipi_send_mask(struct irq_data *d, const struct cpumask *ma= sk) { - int cpu; + int cpu =3D smp_processor_id(); + bool self =3D cpumask_test_cpu(cpu, mask); =20 if (WARN_ON(d->hwirq >=3D 16)) return; @@ -1407,6 +1419,19 @@ static void gic_ipi_send_mask(struct irq_data *d, co= nst struct cpumask *mask) */ dsb(ishst); =20 + if (cpumask_weight(mask) + !self =3D=3D num_online_cpus()) { + /* Broadcast to all but self */ + gic_broadcast_sgi(d->hwirq); + if (self) { + unsigned long mpidr =3D gic_cpu_to_affinity(cpu); + + /* Send to self */ + gic_send_sgi(MPIDR_TO_SGI_CLUSTER_ID(mpidr), + MPIDR_TO_SGI_TARGET_LIST(mpidr), d->hwirq); + } + goto done; + } + for_each_cpu(cpu, mask) { u64 cluster_id =3D MPIDR_TO_SGI_CLUSTER_ID(gic_cpu_to_affinity(cpu)); u16 tlist; @@ -1414,7 +1439,7 @@ static void gic_ipi_send_mask(struct irq_data *d, con= st struct cpumask *mask) tlist =3D gic_compute_target_list(&cpu, mask, cluster_id); gic_send_sgi(cluster_id, tlist, d->hwirq); } - +done: /* Force the above writes to ICC_SGI1R_EL1 to be executed */ isb(); } --=20 2.47.0.277.g8800431eea-goog From nobody Sun Nov 24 02:30:03 2024 Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4DE7B2185AF for ; Thu, 7 Nov 2024 20:20:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731010850; cv=none; b=rPAWjsjkx/GBSmUtMrUWceiu7RFdc+J4OKRfOMffLUxxJlu+1+9B67hLBaEd0xdgXjkM+BGjj6zs8ZF7qZGcuFhoD7VVpSZNF6NDqs3vLALid/UIgfRsPUtskHHGGcimB7jPVcUhdHMklOKEhbpOk371lhLbcKXTuUATEs8NuDg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731010850; c=relaxed/simple; bh=PUjGRS5Zbb9JOA3398lDXtzAJaosXV78/5CRo5RCmfw=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=UuanErRqxj928x1hXd7Oh3mW7vGbiJpAnRetaIRRes/nC6Z+etiVYqaP23Et6+xFcGNsQKyWfYtu98NwuuI16/LFje0i6MACY7N188RCENq4TFVZoap6Lenf8naFn/9YOZqUxPo70X47AbPFkaHhbK4rT+ptmJkVHCJMpghrHcY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--yuzhao.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=T9AohT+8; arc=none smtp.client-ip=209.85.219.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--yuzhao.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="T9AohT+8" Received: by mail-yb1-f201.google.com with SMTP id 3f1490d57ef6-e3313b47a95so2714753276.3 for ; Thu, 07 Nov 2024 12:20:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1731010847; x=1731615647; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=KhIcSe1txNfkCnP7MPXpnY6tOWGMY/4eAudpBxJaXiU=; b=T9AohT+8ENhdfJ3iYeW7yzp8GuUk7r9O0FtXM+MQSJUP0yKKxHtMV8TZibbPw1PT7/ +Ov4T2heJplqRiZrRjHopAPhB3jUiiZgNsbPz+kK0vkS9iZHLFoPxqoT/fwzBKPaOC3A dx/FfIF25NCFaKRlc4GJdge1HXFcUB12mSbyesR79VFLFLF91O8QjJhZTcS8kHObtY9d lxHyaz/Trn3GBtQER+mpqn+2c6RCQfSUM3b4w3tfl42O+JkJLfUaCMH5oee57zSx0Bt2 cgH+uSb3eJYBcedekAKHV6x4TgOqPH6zhNDMRRicaFmDhiA+cs848XPL+eFAKZBxUXaa awQA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731010847; x=1731615647; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=KhIcSe1txNfkCnP7MPXpnY6tOWGMY/4eAudpBxJaXiU=; b=qTPkzLaT4OwbTAedBDXTa8MhMEhoth8FZkBXsuH7O5XIzjZyo9LDQXwqp4dZ+UAYw0 aJiLKIwlU1m8pkbsPEtqsDWIbHFHq0zVdvGEeGm3qf6189gWxf2wHtObW8C44jRCdxcA Ob5I/raG7gdzffEb6fP4WEw4OOE+KKO/ovsCqcH7DcI17wU4cNzUk89vpHXnysGkHWdU uOlmr0xG7fuEELQFg12dCd/Yf3LIrIARsP7g4KWgH8dx/yRPeQ3N4czgBpXrRb/FLnud bTfaJukhfTs95Jv6BY+tmfA+9lsLmO9zee2ALppKY6lT62kS5JlPZ7QDlpOKfdrA3SV2 hAVg== X-Forwarded-Encrypted: i=1; AJvYcCWxqseyfM4j6zMPghtjLAM5xDB5y8s0zwUXbp/g0NZNKAOQABhWvXnUprqvRCxAKqYHPefT0s0wbZf5HrQ=@vger.kernel.org X-Gm-Message-State: AOJu0Yz5tAsEurm5RP9MVh/rD16a8yrBZDnO7vs1d9IdAqRhmUu9DHGa cj8IWSpDXtfUgxSTxzn9Qgn4ueHRpPE6gRuyzWPDRuKzAoIygmKglJRy8aw1lHB88ZlXtex6Jr3 KOw== X-Google-Smtp-Source: AGHT+IEnJ0e357/3OotzjsP9lexlQ5CBUwlEpVYwPI3w1zjYPdS8y4hVyHrOYLJ8xSQLJ2NE0kbCcYElb98= X-Received: from yuzhao2.bld.corp.google.com ([2a00:79e0:2e28:6:a4c0:c64f:6cdd:91f8]) (user=yuzhao job=sendgmr) by 2002:a25:bc84:0:b0:e2b:d0e9:1cdc with SMTP id 3f1490d57ef6-e337f908f8dmr320276.10.1731010847379; Thu, 07 Nov 2024 12:20:47 -0800 (PST) Date: Thu, 7 Nov 2024 13:20:31 -0700 In-Reply-To: <20241107202033.2721681-1-yuzhao@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20241107202033.2721681-1-yuzhao@google.com> X-Mailer: git-send-email 2.47.0.277.g8800431eea-goog Message-ID: <20241107202033.2721681-5-yuzhao@google.com> Subject: [PATCH v2 4/6] arm64: broadcast IPIs to pause remote CPUs From: Yu Zhao To: Andrew Morton , Catalin Marinas , Marc Zyngier , Muchun Song , Thomas Gleixner , Will Deacon Cc: Douglas Anderson , Mark Rutland , Nanyong Sun , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Yu Zhao Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Broadcast pseudo-NMI IPIs to pause remote CPUs for a short period of time, and then reliably resume them when the local CPU exits critical sections that preclude the execution of remote CPUs. A typical example of such critical sections is BBM on kernel PTEs. HugeTLB Vmemmap Optimization (HVO) on arm64 was disabled by commit 060a2c92d1b6 ("arm64: mm: hugetlb: Disable HUGETLB_PAGE_OPTIMIZE_VMEMMAP") due to the folllowing reason: This is deemed UNPREDICTABLE by the Arm architecture without a break-before-make sequence (make the PTE invalid, TLBI, write the new valid PTE). However, such sequence is not possible since the vmemmap may be concurrently accessed by the kernel. Supporting BBM on kernel PTEs is one of the approaches that can make HVO safe on arm64. Signed-off-by: Yu Zhao --- arch/arm64/include/asm/smp.h | 3 ++ arch/arm64/kernel/smp.c | 85 +++++++++++++++++++++++++++++++++--- 2 files changed, 81 insertions(+), 7 deletions(-) diff --git a/arch/arm64/include/asm/smp.h b/arch/arm64/include/asm/smp.h index 2510eec026f7..cffb0cfed961 100644 --- a/arch/arm64/include/asm/smp.h +++ b/arch/arm64/include/asm/smp.h @@ -133,6 +133,9 @@ bool cpus_are_stuck_in_kernel(void); extern void crash_smp_send_stop(void); extern bool smp_crash_stop_failed(void); =20 +void pause_remote_cpus(void); +void resume_remote_cpus(void); + #endif /* ifndef __ASSEMBLY__ */ =20 #endif /* ifndef __ASM_SMP_H */ diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c index 3b3f6b56e733..54e9f6374aa3 100644 --- a/arch/arm64/kernel/smp.c +++ b/arch/arm64/kernel/smp.c @@ -85,7 +85,12 @@ static int ipi_irq_base __ro_after_init; static int nr_ipi __ro_after_init =3D NR_IPI; static struct irq_desc *ipi_desc[MAX_IPI] __ro_after_init; =20 -static bool crash_stop; +enum { + SEND_STOP, + CRASH_STOP, +}; + +static unsigned long stop_in_progress; =20 static void ipi_setup(int cpu); =20 @@ -917,6 +922,72 @@ static void __noreturn ipi_cpu_crash_stop(unsigned int= cpu, struct pt_regs *regs #endif } =20 +static DEFINE_RAW_SPINLOCK(cpu_pause_lock); +static bool __cacheline_aligned_in_smp cpu_paused; +static atomic_t __cacheline_aligned_in_smp nr_cpus_paused; + +static void pause_local_cpu(void) +{ + atomic_inc(&nr_cpus_paused); + + while (READ_ONCE(cpu_paused)) + cpu_relax(); + + atomic_dec(&nr_cpus_paused); + + /* + * The caller of resume_remote_cpus() should make sure that clearing + * cpu_paused is ordered after other changes that can have any impact on + * this CPU. The isb() below makes sure this CPU doesn't speculatively + * execute the next instruction before it sees all those changes. + */ + isb(); +} + +void pause_remote_cpus(void) +{ + cpumask_t cpus_to_pause; + int nr_cpus_to_pause =3D num_online_cpus() - 1; + + lockdep_assert_cpus_held(); + lockdep_assert_preemption_disabled(); + + if (!nr_cpus_to_pause) + return; + + cpumask_copy(&cpus_to_pause, cpu_online_mask); + cpumask_clear_cpu(smp_processor_id(), &cpus_to_pause); + + raw_spin_lock(&cpu_pause_lock); + + WARN_ON_ONCE(cpu_paused); + WARN_ON_ONCE(atomic_read(&nr_cpus_paused)); + + cpu_paused =3D true; + + smp_cross_call(&cpus_to_pause, IPI_CPU_STOP_NMI); + + while (atomic_read(&nr_cpus_paused) !=3D nr_cpus_to_pause) + cpu_relax(); + + raw_spin_unlock(&cpu_pause_lock); +} + +void resume_remote_cpus(void) +{ + if (!cpu_paused) + return; + + raw_spin_lock(&cpu_pause_lock); + + WRITE_ONCE(cpu_paused, false); + + while (atomic_read(&nr_cpus_paused)) + cpu_relax(); + + raw_spin_unlock(&cpu_pause_lock); +} + static void arm64_backtrace_ipi(cpumask_t *mask) { __ipi_send_mask(ipi_desc[IPI_CPU_BACKTRACE], mask); @@ -970,7 +1041,9 @@ static void do_handle_IPI(int ipinr) =20 case IPI_CPU_STOP: case IPI_CPU_STOP_NMI: - if (IS_ENABLED(CONFIG_KEXEC_CORE) && crash_stop) { + if (!test_bit(SEND_STOP, &stop_in_progress)) { + pause_local_cpu(); + } else if (test_bit(CRASH_STOP, &stop_in_progress)) { ipi_cpu_crash_stop(cpu, get_irq_regs()); unreachable(); } else { @@ -1142,7 +1215,6 @@ static inline unsigned int num_other_online_cpus(void) =20 void smp_send_stop(void) { - static unsigned long stop_in_progress; cpumask_t mask; unsigned long timeout; =20 @@ -1154,7 +1226,7 @@ void smp_send_stop(void) goto skip_ipi; =20 /* Only proceed if this is the first CPU to reach this code */ - if (test_and_set_bit(0, &stop_in_progress)) + if (test_and_set_bit(SEND_STOP, &stop_in_progress)) return; =20 /* @@ -1230,12 +1302,11 @@ void crash_smp_send_stop(void) * This function can be called twice in panic path, but obviously * we execute this only once. * - * We use this same boolean to tell whether the IPI we send was a + * We use the CRASH_STOP bit to tell whether the IPI we send was a * stop or a "crash stop". */ - if (crash_stop) + if (test_and_set_bit(CRASH_STOP, &stop_in_progress)) return; - crash_stop =3D 1; =20 smp_send_stop(); =20 --=20 2.47.0.277.g8800431eea-goog From nobody Sun Nov 24 02:30:03 2024 Received: from mail-yw1-f201.google.com (mail-yw1-f201.google.com [209.85.128.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8757A21893A for ; Thu, 7 Nov 2024 20:20:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731010852; cv=none; b=rbUWWH4iZ3KkrUvfZYt40vyYpVhscNhADQnz3dO+FmheFzQTwz51VPc9Qk8iSUEJTzHUHGynholh9NHrA/vj53fLe8N+yl+GvjeKAXtI52cXvTL6SeWJlltNyDRikL0+impieRhzfrnfuXN5glW8ee5uCElgpFK2Q706PKyKe9w= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731010852; c=relaxed/simple; bh=YI+CbUBxtZDwtEvf4/YhGH6h78EVLsQ4qA//gteVUIc=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=cv6L9uRQYw1UbItzAtc3G1V+OQ580JFHnT/43+R0+T4ylEbkIsT6Hghr/njv80HmswH6pF9DatyWQLQ2r01mU17y8Dj6swxbC+yusLdChTyzogEQwq6Uw8SeoaN+ywwJ+VQbRv1ckgjB9/JpDirsLNg9bsjCBT+4vzy0tvC3NW8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--yuzhao.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=PdkPGkY0; arc=none smtp.client-ip=209.85.128.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--yuzhao.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="PdkPGkY0" Received: by mail-yw1-f201.google.com with SMTP id 00721157ae682-6ea8901dff1so28844577b3.1 for ; Thu, 07 Nov 2024 12:20:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1731010849; x=1731615649; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=kshc+5TecBnElfjUZu5d6HS1cvMCCmL1E/fnjesi6Ow=; b=PdkPGkY0ftoi+Eg0TS1AGa4hSlyvDyAWmAds3Hd5wi8sYPiEqjkUZjf2r8Ay+gehU4 i73wJhtn/Z5m/voRnR9+qEgALcns3WrgNA5nnUbNLFam654N1Hh4jI7x4jW8Z252MTWi rrNX6sd/QvOo4DTjP8GWCX2HJIeFes2ofUah6L4rhT4wsjKOW715u2WWX+XbJZIio/Na XklJkOQFFzFM9s0EuH9dT921Ju+uPuYusnlXZAHnKCMSNl43vtlVNAw4QMtEKLoBz30n lOOwFIvGrPaU4vgJXN3mBYjap/iALIeErFAuCmkh1Vp1CqRe8YesjxxDVDzIH7vSSEIb AUbg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731010849; x=1731615649; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=kshc+5TecBnElfjUZu5d6HS1cvMCCmL1E/fnjesi6Ow=; b=ht+6y44RKUPooIDN7EO4l+XNffPyWwoG3k1cQ2HPWBFZOYZHZGonzAElI/ypNiRB0+ hBP0XFadpI7fGckYoxLpsBAIiRDo4mE7X4rIOHMtZpX14rWEQdLkHkbQT8s3GIZdG9Bt mfuM87wZMe4wn+YqhUBq7y5Ik8BnnPbcEcIf5qYwyltfXtAy2kOivFbBv7Duy0Dm4E8C Z52nxtvDzXx7JeHVt/M/LUY3/nY74jxYK+TbPTJnecvSwb4QyGzdxEjWCbcET5CWO8Wu dU4ZhRP57816+86IaLQ3iOhK6VhjJATT9QUGyY5ClJrucbOTlNz9KdI/AI/c81rsOTqR NlmA== X-Forwarded-Encrypted: i=1; AJvYcCXp5TaSLG4rnjDVrzxbmNZRaWFita0C3C+WREmeum9fleJHtTmrFqPrZ6Y7rmktj1aoYU0vVsu6/oeAVXY=@vger.kernel.org X-Gm-Message-State: AOJu0Yysd5M/N9Yc2k+kbBgwXJ0xIjPOYORezW+BlkHqNkzLIZ+RZ+fS SQK2eudlZKhCFeza0LRblinw9iFBGOzqIaUNnYS+4S9RKct02LNbbm/lLrkB7iIRq42zlqCKRKH 4GQ== X-Google-Smtp-Source: AGHT+IEjIv/a3qKWak4A1Vn18DXvCmBDzVmrxwgOEzTP/K9OGHGZiWxXZlnHtaXBKDC55f0LUGtreAyzNRE= X-Received: from yuzhao2.bld.corp.google.com ([2a00:79e0:2e28:6:a4c0:c64f:6cdd:91f8]) (user=yuzhao job=sendgmr) by 2002:a05:690c:c08:b0:6c1:298e:5a7 with SMTP id 00721157ae682-6eaddfb88bcmr33137b3.5.1731010849660; Thu, 07 Nov 2024 12:20:49 -0800 (PST) Date: Thu, 7 Nov 2024 13:20:32 -0700 In-Reply-To: <20241107202033.2721681-1-yuzhao@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20241107202033.2721681-1-yuzhao@google.com> X-Mailer: git-send-email 2.47.0.277.g8800431eea-goog Message-ID: <20241107202033.2721681-6-yuzhao@google.com> Subject: [PATCH v2 5/6] arm64: pause remote CPUs to update vmemmap From: Yu Zhao To: Andrew Morton , Catalin Marinas , Marc Zyngier , Muchun Song , Thomas Gleixner , Will Deacon Cc: Douglas Anderson , Mark Rutland , Nanyong Sun , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Yu Zhao Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Pause remote CPUs so that the local CPU can follow the proper BBM sequence to safely update the vmemmap mapping `struct page` areas. While updating the vmemmap, it is guaranteed that neither the local CPU nor the remote ones will access the `struct page` area being updated, and therefore they should not trigger kernel PFs. Signed-off-by: Yu Zhao --- arch/arm64/include/asm/pgalloc.h | 69 ++++++++++++++++++++++++++++++++ 1 file changed, 69 insertions(+) diff --git a/arch/arm64/include/asm/pgalloc.h b/arch/arm64/include/asm/pgal= loc.h index 8ff5f2a2579e..f50f79f57c1e 100644 --- a/arch/arm64/include/asm/pgalloc.h +++ b/arch/arm64/include/asm/pgalloc.h @@ -12,6 +12,7 @@ #include #include #include +#include =20 #define __HAVE_ARCH_PGD_FREE #define __HAVE_ARCH_PUD_FREE @@ -137,4 +138,72 @@ pmd_populate(struct mm_struct *mm, pmd_t *pmdp, pgtabl= e_t ptep) __pmd_populate(pmdp, page_to_phys(ptep), PMD_TYPE_TABLE | PMD_TABLE_PXN); } =20 +#ifdef CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP + +#define VMEMMAP_ARCH_TLB_FLUSH_FLAGS (VMEMMAP_SPLIT_NO_TLB_FLUSH | VMEMMAP= _REMAP_NO_TLB_FLUSH) + +#define vmemmap_update_supported vmemmap_update_supported +static inline bool vmemmap_update_supported(void) +{ + return system_uses_irq_prio_masking(); +} + +#define vmemmap_update_lock vmemmap_update_lock +static inline void vmemmap_update_lock(void) +{ + cpus_read_lock(); +} + +#define vmemmap_update_unlock vmemmap_update_unlock +static inline void vmemmap_update_unlock(void) +{ + cpus_read_unlock(); +} + +#define vmemmap_update_pte_range_start vmemmap_update_pte_range_start +static inline void vmemmap_update_pte_range_start(pte_t *pte, + unsigned long start, unsigned long end) +{ + unsigned long addr; + + local_irq_disable(); + pause_remote_cpus(); + + for (addr =3D start; addr !=3D end; addr +=3D PAGE_SIZE, pte++) + pte_clear(&init_mm, addr, pte); + + flush_tlb_kernel_range(start, end); +} + +#define vmemmap_update_pte_range_end vmemmap_update_pte_range_end +static inline void vmemmap_update_pte_range_end(void) +{ + resume_remote_cpus(); + local_irq_enable(); +} + +#define vmemmap_update_pmd_range_start vmemmap_update_pmd_range_start +static inline void vmemmap_update_pmd_range_start(pmd_t *pmd, + unsigned long start, unsigned long end) +{ + unsigned long addr; + + local_irq_disable(); + pause_remote_cpus(); + + for (addr =3D start; addr !=3D end; addr +=3D PMD_SIZE, pmd++) + pmd_clear(pmd); + + flush_tlb_kernel_range(start, end); +} + +#define vmemmap_update_pmd_range_end vmemmap_update_pmd_range_end +static inline void vmemmap_update_pmd_range_end(void) +{ + resume_remote_cpus(); + local_irq_enable(); +} + +#endif /* CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP */ + #endif --=20 2.47.0.277.g8800431eea-goog From nobody Sun Nov 24 02:30:03 2024 Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 14512218959 for ; Thu, 7 Nov 2024 20:20:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731010854; cv=none; b=mVcPbtUO5FvFk7iWevyuxTSVC3MFhEZZVYOi9CjttG5MvGfFncu+Q+cAUD8YAYGDGMD0qWdh41m7s84Q+KdAk6xP2itM8OjqMzxvvJrowreagTauU3AjVwHmrnqqT4up65ILYUFa78UgkPZW4oWkRKwl2at8TFvIUbVI/2mLqac= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731010854; c=relaxed/simple; bh=++i2s5A09QlZdar7/X8Vm+Fem/4jpzHHFur9X36Ee/M=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=BU9LfORSuhqnGZ8HMqCshmr7MkQhIqEa7IqIRGteJxsSMvFYrfuBN/8RxtgFy5iCazloHcsYA/KD7YWMXHaJZ04tNr2QRSU8nRsXGiJFjZey2cMDmXAaavuyXawuDsi9eomxRhkSlp6Lul8eN3ml0ksi4j54A1Zrivb40rl/2F8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--yuzhao.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=W9T2jNdZ; arc=none smtp.client-ip=209.85.128.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--yuzhao.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="W9T2jNdZ" Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-6eac7417627so25725887b3.2 for ; Thu, 07 Nov 2024 12:20:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1731010852; x=1731615652; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=RYMFXTGWFxXGTBSXbf4RqlCV7Y4+WKb9i4kIMpLBOpY=; b=W9T2jNdZbjcxkoYhy0mFMWp1tQnt5SctGVmFLC2/H/t8TaPkNQMK+I8GKNHtQG3LB1 ThH8sLIVIQxpKqnrN9aXzZ4eGFmeoV6eXG+AObQyZWF74rcj9cBZQw+ALmdLnbPSsevG aSJRKiEp0gqZuADzGOG4ddPahtNI8oPva9HuSRqH7HKZ75a4fZsPONFUdWVzAza93z5N +i9RCl0fRJp/fgNDTWvFfvFlMkoS0bGyF3CRX7WG6vxSFyvMU20fwhq6kILb6WP9aDLJ xWsAZno0mMn8hZl37J+6GwVucZgukhyGTBMEEOLs9e9ndyvwxGDXevaLYPYt7WiI6/St BiUA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731010852; x=1731615652; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=RYMFXTGWFxXGTBSXbf4RqlCV7Y4+WKb9i4kIMpLBOpY=; b=U6yEnwdeb+kuMfXWjawc5Sk6cV0z8S/2Dv0+pmm4GCw7X9AbS7FaTaBxQJwNxO0/jX OacB7giOMgj4F8acrNaTH9cDmINrUfu2QB+sIzcFgWrJYqqukD9T8QH1BfTfsBNQgosc qen5sLwJDyppTuUzJDQj4qr8fB9ZxJvbmN2BmcfRYHBpjvYYuT6GygBxtB2iYYvl2StC Wu7pNG7tD5fleqtpzKztsEi1BWzUoPY2ijxwPO9+PTJDOpGsaPt0Nw4fgbxOBJ2xoIih A4RXgiSzxfSw1Y0WpBxw+ngc/H2vv6bnag3Fx7EJQMoRqwzM53166MBg2JJ29pB6F2iX 9EFg== X-Forwarded-Encrypted: i=1; AJvYcCWW+dEqSG+bah3T9ZEo7WzN6iIr/RHd134eG6xNi5qsz1Iv6Rt2mM84c9Bb2/6MclsW6z0zelC2/Bw4IPI=@vger.kernel.org X-Gm-Message-State: AOJu0Yx9ph739Zl5JhY7HkupU1wxPt3BwcwGWxH8E+hCY2Cn5SIoTZ7V DlcaIm0dI8dwStAcP5Q4Fe3D1TUREWdXFGDPSt+/5M8G+MYM4JqbB00X73hGhKUXQiBkX1KwZJI 9eQ== X-Google-Smtp-Source: AGHT+IHlhijSC9zoCXOj87mpdA8tF9p5aSLV/movLFpBkBVUAVMMT2gDU4exk7NlnuXq35RVmXnfaJPFwyA= X-Received: from yuzhao2.bld.corp.google.com ([2a00:79e0:2e28:6:a4c0:c64f:6cdd:91f8]) (user=yuzhao job=sendgmr) by 2002:a5b:809:0:b0:e2e:3401:ea0f with SMTP id 3f1490d57ef6-e337f8f63a3mr313276.7.1731010852085; Thu, 07 Nov 2024 12:20:52 -0800 (PST) Date: Thu, 7 Nov 2024 13:20:33 -0700 In-Reply-To: <20241107202033.2721681-1-yuzhao@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20241107202033.2721681-1-yuzhao@google.com> X-Mailer: git-send-email 2.47.0.277.g8800431eea-goog Message-ID: <20241107202033.2721681-7-yuzhao@google.com> Subject: [PATCH v2 6/6] arm64: select ARCH_WANT_OPTIMIZE_HUGETLB_VMEMMAP From: Yu Zhao To: Andrew Morton , Catalin Marinas , Marc Zyngier , Muchun Song , Thomas Gleixner , Will Deacon Cc: Douglas Anderson , Mark Rutland , Nanyong Sun , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Yu Zhao Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" To use HVO, make sure that the kernel is booted with pseudo-NMI enabled by "irqchip.gicv3_pseudo_nmi=3D1", as well as "hugetlb_free_vmemmap=3Don" unless HVO is enabled by default. Note that HVO checks the pseudo-NMI capability and is disabled at runtime if that turns out not supported. Successfully enabling HVO should have the following: # dmesg | grep NMI GICv3: Pseudo-NMIs enabled using ... # sysctl vm.hugetlb_optimize_vmemmap vm.hugetlb_optimize_vmemmap =3D 1 For comparison purposes, the whole series was measured against this patch only, to show the overhead from pausing remote CPUs: HugeTLB operations This patch only The whole series Change Alloc 600 1GB 0m3.526s 0m3.649s +4% Free 600 1GB 0m0.880s 0m0.917s +4% Demote 600 1GB to 307200 2MB 0m1.575s 0m3.640s +231% Free 307200 2MB 0m0.946s 0m2.921s +309% Signed-off-by: Yu Zhao --- arch/arm64/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index fd9df6dcc593..e93745f819d9 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -109,6 +109,7 @@ config ARM64 select ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT select ARCH_WANT_FRAME_POINTERS select ARCH_WANT_HUGE_PMD_SHARE if ARM64_4K_PAGES || (ARM64_16K_PAGES && = !ARM64_VA_BITS_36) + select ARCH_WANT_OPTIMIZE_HUGETLB_VMEMMAP select ARCH_WANT_LD_ORPHAN_WARN select ARCH_WANTS_EXECMEM_LATE if EXECMEM select ARCH_WANTS_NO_INSTR --=20 2.47.0.277.g8800431eea-goog