From nobody Mon Sep 15 07:39:46 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 96CCCC46467 for ; Mon, 16 Jan 2023 06:31:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232059AbjAPGbx (ORCPT ); Mon, 16 Jan 2023 01:31:53 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57534 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231897AbjAPGbj (ORCPT ); Mon, 16 Jan 2023 01:31:39 -0500 Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DC890BB90 for ; Sun, 15 Jan 2023 22:31:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1673850687; x=1705386687; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=2La8Np+Radhjz09NHg913SidfNnKYzsWmL+WkrdLqG4=; b=QCs4IeS0aOXeoJ+DbvKE0UvKOEvaslHXmNtkCAA3nRdNEYivXvI4uQkR o2MJN38G7r8y/azlVdT7IBU7a4atcdwCNdPwNSJ/0xfohsPDoPiCt1lUe A+JpKCplk4WluJBAtuNUGB3Tqk2su4D2grMA0wE6GTYyJG9JhRKFpxsk/ K9DczbvwUYmJfBpICI6IMpBK2sQ/Ym3RBUjRQTqaMQersaK9P6pWVTnh5 xB2moDQvGR0Msjr3SLY/XpLf51yLOAYp8oqI+h7asu8/4W6Y+xVhq9KZC xhP5ADmVp3pyir8AP/vByutIveBxbpu5q+PD4/DQHY/7bD6uwSOWBP1IR g==; X-IronPort-AV: E=McAfee;i="6500,9779,10591"; a="388892175" X-IronPort-AV: E=Sophos;i="5.97,220,1669104000"; d="scan'208";a="388892175" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Jan 2023 22:31:27 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10591"; a="801286664" X-IronPort-AV: E=Sophos;i="5.97,220,1669104000"; d="scan'208";a="801286664" Received: from tiangeng-mobl.ccr.corp.intel.com (HELO yhuang6-mobl2.ccr.corp.intel.com) ([10.255.28.220]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Jan 2023 22:31:24 -0800 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , Zi Yan , Yang Shi , Baolin Wang , Oscar Salvador , Matthew Wilcox , Bharata B Rao , Alistair Popple , haoxin , Minchan Kim Subject: [PATCH -v3 5/9] migrate_pages: batch _unmap and _move Date: Mon, 16 Jan 2023 14:30:53 +0800 Message-Id: <20230116063057.653862-6-ying.huang@intel.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20230116063057.653862-1-ying.huang@intel.com> References: <20230116063057.653862-1-ying.huang@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" In this patch the _unmap and _move stage of the folio migration is batched. That for, previously, it is, for each folio _unmap() _move() Now, it is, for each folio _unmap() for each folio _move() Based on this, we can batch the TLB flushing and use some hardware accelerator to copy folios between batched _unmap and batched _move stages. Signed-off-by: "Huang, Ying" Cc: Zi Yan Cc: Yang Shi Cc: Baolin Wang Cc: Oscar Salvador Cc: Matthew Wilcox Cc: Bharata B Rao Cc: Alistair Popple Cc: haoxin Cc: Minchan Kim --- mm/migrate.c | 207 +++++++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 183 insertions(+), 24 deletions(-) diff --git a/mm/migrate.c b/mm/migrate.c index 0428449149f4..143d96775b4d 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -1033,6 +1033,33 @@ static void __migrate_folio_extract(struct folio *ds= t, dst->private =3D NULL; } =20 +/* Restore the source folio to the original state upon failure */ +static void migrate_folio_undo_src(struct folio *src, + int page_was_mapped, + struct anon_vma *anon_vma, + struct list_head *ret) +{ + if (page_was_mapped) + remove_migration_ptes(src, src, false); + /* Drop an anon_vma reference if we took one */ + if (anon_vma) + put_anon_vma(anon_vma); + folio_unlock(src); + list_move_tail(&src->lru, ret); +} + +/* Restore the destination folio to the original state upon failure */ +static void migrate_folio_undo_dst(struct folio *dst, + free_page_t put_new_page, + unsigned long private) +{ + folio_unlock(dst); + if (put_new_page) + put_new_page(&dst->page, private); + else + folio_put(dst); +} + /* Cleanup src folio upon migration success */ static void migrate_folio_done(struct folio *src, enum migrate_reason reason) @@ -1052,7 +1079,7 @@ static void migrate_folio_done(struct folio *src, } =20 static int __migrate_folio_unmap(struct folio *src, struct folio *dst, - int force, enum migrate_mode mode) + int force, bool force_lock, enum migrate_mode mode) { int rc =3D -EAGAIN; int page_was_mapped =3D 0; @@ -1079,6 +1106,17 @@ static int __migrate_folio_unmap(struct folio *src, = struct folio *dst, if (current->flags & PF_MEMALLOC) goto out; =20 + /* + * We have locked some folios, to avoid deadlock, we cannot + * lock the folio synchronously. Go out to process (and + * unlock) all the locked folios. Then we can lock the folio + * synchronously. + */ + if (!force_lock) { + rc =3D -EDEADLOCK; + goto out; + } + folio_lock(src); } =20 @@ -1191,6 +1229,10 @@ static int __migrate_folio_move(struct folio *src, s= truct folio *dst, __migrate_folio_extract(dst, &page_was_mapped, &anon_vma); =20 rc =3D move_to_new_folio(dst, src, mode); + + if (rc !=3D -EAGAIN) + list_del(&dst->lru); + if (unlikely(!is_lru)) goto out_unlock_both; =20 @@ -1209,6 +1251,11 @@ static int __migrate_folio_move(struct folio *src, s= truct folio *dst, lru_add_drain(); } =20 + if (rc =3D=3D -EAGAIN) { + __migrate_folio_record(dst, page_was_mapped, anon_vma); + return rc; + } + if (page_was_mapped) remove_migration_ptes(src, rc =3D=3D MIGRATEPAGE_SUCCESS ? dst : src, false); @@ -1233,7 +1280,7 @@ static int __migrate_folio_move(struct folio *src, st= ruct folio *dst, /* Obtain the lock on page, remove all ptes. */ static int migrate_folio_unmap(new_page_t get_new_page, free_page_t put_ne= w_page, unsigned long private, struct folio *src, - struct folio **dstp, int force, + struct folio **dstp, int force, bool force_lock, enum migrate_mode mode, enum migrate_reason reason, struct list_head *ret) { @@ -1261,7 +1308,7 @@ static int migrate_folio_unmap(new_page_t get_new_pag= e, free_page_t put_new_page *dstp =3D dst; =20 dst->private =3D NULL; - rc =3D __migrate_folio_unmap(src, dst, force, mode); + rc =3D __migrate_folio_unmap(src, dst, force, force_lock, mode); if (rc =3D=3D MIGRATEPAGE_UNMAP) return rc; =20 @@ -1270,7 +1317,7 @@ static int migrate_folio_unmap(new_page_t get_new_pag= e, free_page_t put_new_page * references and be restored. */ /* restore the folio to right list. */ - if (rc !=3D -EAGAIN) + if (rc !=3D -EAGAIN && rc !=3D -EDEADLOCK) list_move_tail(&src->lru, ret); =20 if (put_new_page) @@ -1309,9 +1356,8 @@ static int migrate_folio_move(free_page_t put_new_pag= e, unsigned long private, */ if (rc =3D=3D MIGRATEPAGE_SUCCESS) { migrate_folio_done(src, reason); - } else { - if (rc !=3D -EAGAIN) - list_add_tail(&src->lru, ret); + } else if (rc !=3D -EAGAIN) { + list_add_tail(&src->lru, ret); =20 if (put_new_page) put_new_page(&dst->page, private); @@ -1591,7 +1637,7 @@ static int migrate_pages_batch(struct list_head *from= , new_page_t get_new_page, enum migrate_mode mode, int reason, struct list_head *ret_folios, struct migrate_pages_stats *stats) { - int retry =3D 1; + int retry; int large_retry =3D 1; int thp_retry =3D 1; int nr_failed =3D 0; @@ -1600,13 +1646,19 @@ static int migrate_pages_batch(struct list_head *fr= om, new_page_t get_new_page, int pass =3D 0; bool is_large =3D false; bool is_thp =3D false; - struct folio *folio, *folio2, *dst =3D NULL; - int rc, nr_pages; + struct folio *folio, *folio2, *dst =3D NULL, *dst2; + int rc, rc_saved, nr_pages; LIST_HEAD(split_folios); + LIST_HEAD(unmap_folios); + LIST_HEAD(dst_folios); bool nosplit =3D (reason =3D=3D MR_NUMA_MISPLACED); bool no_split_folio_counting =3D false; + bool force_lock; =20 -split_folio_migration: +retry: + rc_saved =3D 0; + force_lock =3D true; + retry =3D 1; for (pass =3D 0; pass < NR_MAX_MIGRATE_PAGES_RETRY && (retry || large_retry); pass++) { @@ -1628,16 +1680,15 @@ static int migrate_pages_batch(struct list_head *fr= om, new_page_t get_new_page, cond_resched(); =20 rc =3D migrate_folio_unmap(get_new_page, put_new_page, private, - folio, &dst, pass > 2, mode, - reason, ret_folios); - if (rc =3D=3D MIGRATEPAGE_UNMAP) - rc =3D migrate_folio_move(put_new_page, private, - folio, dst, mode, - reason, ret_folios); + folio, &dst, pass > 2, force_lock, + mode, reason, ret_folios); /* * The rules are: * Success: folio will be freed + * Unmap: folio will be put on unmap_folios list, + * dst folio put on dst_folios list * -EAGAIN: stay on the from list + * -EDEADLOCK: stay on the from list * -ENOMEM: stay on the from list * -ENOSYS: stay on the from list * Other errno: put on ret_folios list @@ -1672,7 +1723,7 @@ static int migrate_pages_batch(struct list_head *from= , new_page_t get_new_page, case -ENOMEM: /* * When memory is low, don't bother to try to migrate - * other folios, just exit. + * other folios, move unmapped folios, then exit. */ if (is_large) { nr_large_failed++; @@ -1711,7 +1762,19 @@ static int migrate_pages_batch(struct list_head *fro= m, new_page_t get_new_page, /* nr_failed isn't updated for not used */ nr_large_failed +=3D large_retry; stats->nr_thp_failed +=3D thp_retry; - goto out; + rc_saved =3D rc; + if (list_empty(&unmap_folios)) + goto out; + else + goto move; + case -EDEADLOCK: + /* + * The folio cannot be locked for potential deadlock. + * Go move (and unlock) all locked folios. Then we can + * try again. + */ + rc_saved =3D rc; + goto move; case -EAGAIN: if (is_large) { large_retry++; @@ -1725,6 +1788,15 @@ static int migrate_pages_batch(struct list_head *fro= m, new_page_t get_new_page, stats->nr_succeeded +=3D nr_pages; stats->nr_thp_succeeded +=3D is_thp; break; + case MIGRATEPAGE_UNMAP: + /* + * We have locked some folios, don't force lock + * to avoid deadlock. + */ + force_lock =3D false; + list_move_tail(&folio->lru, &unmap_folios); + list_add_tail(&dst->lru, &dst_folios); + break; default: /* * Permanent failure (-EBUSY, etc.): @@ -1748,12 +1820,95 @@ static int migrate_pages_batch(struct list_head *fr= om, new_page_t get_new_page, nr_large_failed +=3D large_retry; stats->nr_thp_failed +=3D thp_retry; stats->nr_failed_pages +=3D nr_retry_pages; +move: + retry =3D 1; + for (pass =3D 0; + pass < NR_MAX_MIGRATE_PAGES_RETRY && (retry || large_retry); + pass++) { + retry =3D 0; + large_retry =3D 0; + thp_retry =3D 0; + nr_retry_pages =3D 0; + + dst =3D list_first_entry(&dst_folios, struct folio, lru); + dst2 =3D list_next_entry(dst, lru); + list_for_each_entry_safe(folio, folio2, &unmap_folios, lru) { + is_large =3D folio_test_large(folio); + is_thp =3D is_large && folio_test_pmd_mappable(folio); + nr_pages =3D folio_nr_pages(folio); + + cond_resched(); + + rc =3D migrate_folio_move(put_new_page, private, + folio, dst, mode, + reason, ret_folios); + /* + * The rules are: + * Success: folio will be freed + * -EAGAIN: stay on the unmap_folios list + * Other errno: put on ret_folios list + */ + switch(rc) { + case -EAGAIN: + if (is_large) { + large_retry++; + thp_retry +=3D is_thp; + } else if (!no_split_folio_counting) { + retry++; + } + nr_retry_pages +=3D nr_pages; + break; + case MIGRATEPAGE_SUCCESS: + stats->nr_succeeded +=3D nr_pages; + stats->nr_thp_succeeded +=3D is_thp; + break; + default: + if (is_large) { + nr_large_failed++; + stats->nr_thp_failed +=3D is_thp; + } else if (!no_split_folio_counting) { + nr_failed++; + } + + stats->nr_failed_pages +=3D nr_pages; + break; + } + dst =3D dst2; + dst2 =3D list_next_entry(dst, lru); + } + } + nr_failed +=3D retry; + nr_large_failed +=3D large_retry; + stats->nr_thp_failed +=3D thp_retry; + stats->nr_failed_pages +=3D nr_retry_pages; + + if (rc_saved) + rc =3D rc_saved; + else + rc =3D nr_failed + nr_large_failed; +out: + /* Cleanup remaining folios */ + dst =3D list_first_entry(&dst_folios, struct folio, lru); + dst2 =3D list_next_entry(dst, lru); + list_for_each_entry_safe(folio, folio2, &unmap_folios, lru) { + int page_was_mapped =3D 0; + struct anon_vma *anon_vma =3D NULL; + + __migrate_folio_extract(dst, &page_was_mapped, &anon_vma); + migrate_folio_undo_src(folio, page_was_mapped, anon_vma, + ret_folios); + list_del(&dst->lru); + migrate_folio_undo_dst(dst, put_new_page, private); + dst =3D dst2; + dst2 =3D list_next_entry(dst, lru); + } + /* * Try to migrate split folios of fail-to-migrate large folios, no * nr_failed counting in this round, since all split folios of a * large folio is counted as 1 failure in the first round. */ - if (!list_empty(&split_folios)) { + if (rc >=3D 0 && !list_empty(&split_folios)) { /* * Move non-migrated folios (after NR_MAX_MIGRATE_PAGES_RETRY * retries) to ret_folios to avoid migrating them again. @@ -1761,12 +1916,16 @@ static int migrate_pages_batch(struct list_head *fr= om, new_page_t get_new_page, list_splice_init(from, ret_folios); list_splice_init(&split_folios, from); no_split_folio_counting =3D true; - retry =3D 1; - goto split_folio_migration; + goto retry; } =20 - rc =3D nr_failed + nr_large_failed; -out: + /* + * We have unlocked all locked folios, so we can force lock now, let's + * try again. + */ + if (rc =3D=3D -EDEADLOCK) + goto retry; + return rc; } =20 --=20 2.35.1