From nobody Sun Feb  8 15:48:23 2026
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 8B4D6C678D5
	for <linux-kernel@archiver.kernel.org>; Fri, 24 Feb 2023 14:12:22 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S229863AbjBXOMV (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 24 Feb 2023 09:12:21 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33240 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S229462AbjBXOMR (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 24 Feb 2023 09:12:17 -0500
Received: from mga04.intel.com (mga04.intel.com [192.55.52.120])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C35AB10AA4
        for <linux-kernel@vger.kernel.org>;
 Fri, 24 Feb 2023 06:12:15 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677247935; x=1708783935;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=X6tTKRKr9Xy2b+0gbUDPmROHq6FpaNkIJooU65G3dW4=;
  b=XKarCqajUK6Y+PmjJlHEJIbv8T6VcY3QqAw3L55ytRgE0OLXGHYTnM9D
   sm37KIx+5KBO5R0n17MWYPGvCzLucg3SZPY4V3m99q6BRBFXJM6ClmKAF
   Tks6vQFRkWNpYuBjdd0qjG0qHvcJEo31i7KXxXLJjwjNyiipjxFFx3Vug
   lEGyVSl/Ia3RTH0R66JnMkkPpLo4W8b7XMs7vmVNIduECpNUUDB3Bp6Oj
   2O2eb9hIlII9dhyvAVndqUbifhOtjHBTgKqUBQI8lpC1JY/zt/FrJVcQH
   c31eBpxzICEY6RMKzBLaFKyGi9M/hENK6/MYK5Y6kTLXUZae4F5JzkKlf
   w==;
X-IronPort-AV: E=McAfee;i="6500,9779,10630"; a="332167685"
X-IronPort-AV: E=Sophos;i="5.97,324,1669104000";
   d="scan'208";a="332167685"
Received: from fmsmga004.fm.intel.com ([10.253.24.48])
  by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 24 Feb 2023 06:12:15 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10630"; a="741684649"
X-IronPort-AV: E=Sophos;i="5.97,324,1669104000";
   d="scan'208";a="741684649"
Received: from bingqili-mobl2.ccr.corp.intel.com (HELO
 yhuang6-mobl2.ccr.corp.intel.com) ([10.255.28.19])
  by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 24 Feb 2023 06:12:11 -0800
From: Huang Ying <ying.huang@intel.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
        Huang Ying <ying.huang@intel.com>,
        Hugh Dickins <hughd@google.com>,
        "Xu, Pengfei" <pengfei.xu@intel.com>,
        Christoph Hellwig <hch@lst.de>,
        Stefan Roesch <shr@devkernel.io>, Tejun Heo <tj@kernel.org>,
        Xin Hao <xhao@linux.alibaba.com>, Zi Yan <ziy@nvidia.com>,
        Yang Shi <shy828301@gmail.com>,
        Baolin Wang <baolin.wang@linux.alibaba.com>,
        Matthew Wilcox <willy@infradead.org>,
        Mike Kravetz <mike.kravetz@oracle.com>
Subject: [PATCH 1/3] migrate_pages: fix deadlock in batched migration
Date: Fri, 24 Feb 2023 22:11:43 +0800
Message-Id: <20230224141145.96814-2-ying.huang@intel.com>
X-Mailer: git-send-email 2.39.1
In-Reply-To: <20230224141145.96814-1-ying.huang@intel.com>
References: <20230224141145.96814-1-ying.huang@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

Two deadlock bugs were reported for the migrate_pages() batching
series.  Thanks Hugh and Pengfei!  For example, in the following
deadlock trace snippet,

 INFO: task kworker/u4:0:9 blocked for more than 147 seconds.
       Not tainted 6.2.0-rc4-kvm+ #1314
 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 task:kworker/u4:0    state:D stack:0     pid:9     ppid:2      flags:0x000=
04000
 Workqueue: loop4 loop_rootcg_workfn
 Call Trace:
  <TASK>
  __schedule+0x43b/0xd00
  schedule+0x6a/0xf0
  io_schedule+0x4a/0x80
  folio_wait_bit_common+0x1b5/0x4e0
  ? __pfx_wake_page_function+0x10/0x10
  __filemap_get_folio+0x73d/0x770
  shmem_get_folio_gfp+0x1fd/0xc80
  shmem_write_begin+0x91/0x220
  generic_perform_write+0x10e/0x2e0
  __generic_file_write_iter+0x17e/0x290
  ? generic_write_checks+0x12b/0x1a0
  generic_file_write_iter+0x97/0x180
  ? __sanitizer_cov_trace_const_cmp4+0x1a/0x20
  do_iter_readv_writev+0x13c/0x210
  ? __sanitizer_cov_trace_const_cmp4+0x1a/0x20
  do_iter_write+0xf6/0x330
  vfs_iter_write+0x46/0x70
  loop_process_work+0x723/0xfe0
  loop_rootcg_workfn+0x28/0x40
  process_one_work+0x3cc/0x8d0
  worker_thread+0x66/0x630
  ? __pfx_worker_thread+0x10/0x10
  kthread+0x153/0x190
  ? __pfx_kthread+0x10/0x10
  ret_from_fork+0x29/0x50
  </TASK>

 INFO: task repro:1023 blocked for more than 147 seconds.
       Not tainted 6.2.0-rc4-kvm+ #1314
 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 task:repro           state:D stack:0     pid:1023  ppid:360    flags:0x000=
04004
 Call Trace:
  <TASK>
  __schedule+0x43b/0xd00
  schedule+0x6a/0xf0
  io_schedule+0x4a/0x80
  folio_wait_bit_common+0x1b5/0x4e0
  ? compaction_alloc+0x77/0x1150
  ? __pfx_wake_page_function+0x10/0x10
  folio_wait_bit+0x30/0x40
  folio_wait_writeback+0x2e/0x1e0
  migrate_pages_batch+0x555/0x1ac0
  ? __pfx_compaction_alloc+0x10/0x10
  ? __pfx_compaction_free+0x10/0x10
  ? __this_cpu_preempt_check+0x17/0x20
  ? lock_is_held_type+0xe6/0x140
  migrate_pages+0x100e/0x1180
  ? __pfx_compaction_free+0x10/0x10
  ? __pfx_compaction_alloc+0x10/0x10
  compact_zone+0xe10/0x1b50
  ? lock_is_held_type+0xe6/0x140
  ? check_preemption_disabled+0x80/0xf0
  compact_node+0xa3/0x100
  ? __sanitizer_cov_trace_const_cmp8+0x1c/0x30
  ? _find_first_bit+0x7b/0x90
  sysctl_compaction_handler+0x5d/0xb0
  proc_sys_call_handler+0x29d/0x420
  proc_sys_write+0x2b/0x40
  vfs_write+0x3a3/0x780
  ksys_write+0xb7/0x180
  __x64_sys_write+0x26/0x30
  do_syscall_64+0x3b/0x90
  entry_SYSCALL_64_after_hwframe+0x72/0xdc
 RIP: 0033:0x7f3a2471f59d
 RSP: 002b:00007ffe567f7288 EFLAGS: 00000217 ORIG_RAX: 0000000000000001
 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f3a2471f59d
 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000005
 RBP: 00007ffe567f72a0 R08: 0000000000000010 R09: 0000000000000010
 R10: 0000000000000010 R11: 0000000000000217 R12: 00000000004012e0
 R13: 00007ffe567f73e0 R14: 0000000000000000 R15: 0000000000000000
  </TASK>

The page migration task has held the lock of the shmem folio A, and is
waiting the writeback of the folio B of the file system on the loop
block device to complete.  While the loop worker task which writes
back the folio B is waiting to lock the shmem folio A, because the
folio A backs the folio B in the loop device.  Thus deadlock is
triggered.

In general, if we have locked some other folios except the one we are
migrating, it's not safe to wait synchronously, for example, to wait
the writeback to complete or wait to lock the buffer head.

To fix the deadlock, in this patch, we avoid to batch the page
migration except for MIGRATE_ASYNC mode.  In MIGRATE_ASYNC mode,
synchronous waiting is avoided.

The fix can be improved further.  We will do that as soon as possible.

Link: https://lore.kernel.org/linux-mm/87a6c8c-c5c1-67dc-1e32-eb30831d6e3d@=
google.com/
Link: https://lore.kernel.org/linux-mm/874jrg7kke.fsf@yhuang6-desk2.ccr.cor=
p.intel.com/
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Reported-by: Hugh Dickins <hughd@google.com>
Reported-by: "Xu, Pengfei" <pengfei.xu@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Stefan Roesch <shr@devkernel.io>
Cc: Tejun Heo <tj@kernel.org>
Cc: Xin Hao <xhao@linux.alibaba.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
---
 mm/migrate.c | 62 ++++++++++++++++------------------------------------
 1 file changed, 19 insertions(+), 43 deletions(-)
diff --git a/mm/migrate.c b/mm/migrate.c
index 37865f85df6d..7ac37dbbf307 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1106,7 +1106,7 @@ static void migrate_folio_done(struct folio *src,
 /* Obtain the lock on page, remove all ptes. */
 static int migrate_folio_unmap(new_page_t get_new_page, free_page_t put_ne=
w_page,
 			       unsigned long private, struct folio *src,
-			       struct folio **dstp, int force, bool avoid_force_lock,
+			       struct folio **dstp, int force,
 			       enum migrate_mode mode, enum migrate_reason reason,
 			       struct list_head *ret)
 {
@@ -1157,17 +1157,6 @@ static int migrate_folio_unmap(new_page_t get_new_pa=
ge, free_page_t put_new_page
 		if (current->flags & PF_MEMALLOC)
 			goto out;
=20
-		/*
-		 * We have locked some folios and are going to wait to lock
-		 * this folio.  To avoid a potential deadlock, let's bail
-		 * out and not do that. The locked folios will be moved and
-		 * unlocked, then we can wait to lock this folio.
-		 */
-		if (avoid_force_lock) {
-			rc =3D -EDEADLOCK;
-			goto out;
-		}
-
 		folio_lock(src);
 	}
 	locked =3D true;
@@ -1247,7 +1236,7 @@ static int migrate_folio_unmap(new_page_t get_new_pag=
e, free_page_t put_new_page
 		/* Establish migration ptes */
 		VM_BUG_ON_FOLIO(folio_test_anon(src) &&
 			       !folio_test_ksm(src) && !anon_vma, src);
-		try_to_migrate(src, TTU_BATCH_FLUSH);
+		try_to_migrate(src, mode =3D=3D MIGRATE_ASYNC ? TTU_BATCH_FLUSH : 0);
 		page_was_mapped =3D 1;
 	}
=20
@@ -1261,7 +1250,7 @@ static int migrate_folio_unmap(new_page_t get_new_pag=
e, free_page_t put_new_page
 	 * A folio that has not been unmapped will be restored to
 	 * right list unless we want to retry.
 	 */
-	if (rc =3D=3D -EAGAIN || rc =3D=3D -EDEADLOCK)
+	if (rc =3D=3D -EAGAIN)
 		ret =3D NULL;
=20
 	migrate_folio_undo_src(src, page_was_mapped, anon_vma, locked, ret);
@@ -1634,11 +1623,9 @@ static int migrate_pages_batch(struct list_head *fro=
m, new_page_t get_new_page,
 	LIST_HEAD(dst_folios);
 	bool nosplit =3D (reason =3D=3D MR_NUMA_MISPLACED);
 	bool no_split_folio_counting =3D false;
-	bool avoid_force_lock;
=20
 retry:
 	rc_saved =3D 0;
-	avoid_force_lock =3D false;
 	retry =3D 1;
 	for (pass =3D 0;
 	     pass < NR_MAX_MIGRATE_PAGES_RETRY && (retry || large_retry);
@@ -1683,15 +1670,14 @@ static int migrate_pages_batch(struct list_head *fr=
om, new_page_t get_new_page,
 			}
=20
 			rc =3D migrate_folio_unmap(get_new_page, put_new_page, private,
-						 folio, &dst, pass > 2, avoid_force_lock,
-						 mode, reason, ret_folios);
+						 folio, &dst, pass > 2, mode,
+						 reason, ret_folios);
 			/*
 			 * The rules are:
 			 *	Success: folio will be freed
 			 *	Unmap: folio will be put on unmap_folios list,
 			 *	       dst folio put on dst_folios list
 			 *	-EAGAIN: stay on the from list
-			 *	-EDEADLOCK: stay on the from list
 			 *	-ENOMEM: stay on the from list
 			 *	Other errno: put on ret_folios list
 			 */
@@ -1743,14 +1729,6 @@ static int migrate_pages_batch(struct list_head *fro=
m, new_page_t get_new_page,
 					goto out;
 				else
 					goto move;
-			case -EDEADLOCK:
-				/*
-				 * The folio cannot be locked for potential deadlock.
-				 * Go move (and unlock) all locked folios.  Then we can
-				 * try again.
-				 */
-				rc_saved =3D rc;
-				goto move;
 			case -EAGAIN:
 				if (is_large) {
 					large_retry++;
@@ -1765,11 +1743,6 @@ static int migrate_pages_batch(struct list_head *fro=
m, new_page_t get_new_page,
 				stats->nr_thp_succeeded +=3D is_thp;
 				break;
 			case MIGRATEPAGE_UNMAP:
-				/*
-				 * We have locked some folios, don't force lock
-				 * to avoid deadlock.
-				 */
-				avoid_force_lock =3D true;
 				list_move_tail(&folio->lru, &unmap_folios);
 				list_add_tail(&dst->lru, &dst_folios);
 				break;
@@ -1894,17 +1867,15 @@ static int migrate_pages_batch(struct list_head *fr=
om, new_page_t get_new_page,
 		 */
 		list_splice_init(from, ret_folios);
 		list_splice_init(&split_folios, from);
+		/*
+		 * Force async mode to avoid to wait lock or bit when we have
+		 * locked more than one folios.
+		 */
+		mode =3D MIGRATE_ASYNC;
 		no_split_folio_counting =3D true;
 		goto retry;
 	}
=20
-	/*
-	 * We have unlocked all locked folios, so we can force lock now, let's
-	 * try again.
-	 */
-	if (rc =3D=3D -EDEADLOCK)
-		goto retry;
-
 	return rc;
 }
=20
@@ -1939,7 +1910,7 @@ int migrate_pages(struct list_head *from, new_page_t =
get_new_page,
 		enum migrate_mode mode, int reason, unsigned int *ret_succeeded)
 {
 	int rc, rc_gather;
-	int nr_pages;
+	int nr_pages, batch;
 	struct folio *folio, *folio2;
 	LIST_HEAD(folios);
 	LIST_HEAD(ret_folios);
@@ -1953,6 +1924,11 @@ int migrate_pages(struct list_head *from, new_page_t=
 get_new_page,
 				     mode, reason, &stats, &ret_folios);
 	if (rc_gather < 0)
 		goto out;
+
+	if (mode =3D=3D MIGRATE_ASYNC)
+		batch =3D NR_MAX_BATCHED_MIGRATION;
+	else
+		batch =3D 1;
 again:
 	nr_pages =3D 0;
 	list_for_each_entry_safe(folio, folio2, from, lru) {
@@ -1963,11 +1939,11 @@ int migrate_pages(struct list_head *from, new_page_=
t get_new_page,
 		}
=20
 		nr_pages +=3D folio_nr_pages(folio);
-		if (nr_pages > NR_MAX_BATCHED_MIGRATION)
+		if (nr_pages >=3D batch)
 			break;
 	}
-	if (nr_pages > NR_MAX_BATCHED_MIGRATION)
-		list_cut_before(&folios, from, &folio->lru);
+	if (nr_pages >=3D batch)
+		list_cut_before(&folios, from, &folio2->lru);
 	else
 		list_splice_init(from, &folios);
 	rc =3D migrate_pages_batch(&folios, get_new_page, put_new_page, private,
--=20
2.39.1
From nobody Sun Feb  8 15:48:23 2026
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 051CFC61DA3
	for <linux-kernel@archiver.kernel.org>; Fri, 24 Feb 2023 14:12:25 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S229875AbjBXOMX (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 24 Feb 2023 09:12:23 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33320 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S229854AbjBXOMU (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 24 Feb 2023 09:12:20 -0500
Received: from mga04.intel.com (mga04.intel.com [192.55.52.120])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 654C6C64F
        for <linux-kernel@vger.kernel.org>;
 Fri, 24 Feb 2023 06:12:19 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677247939; x=1708783939;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=CZw3zTaszsFTuJAllktsjeXaRcVC/hbLzU4WTsAVGdY=;
  b=Oz0JiF4m1K01jpcNo4vIjiohFTTzVk/ZFFJKMRfnsoSnsha6nXuUKxaQ
   HGSGdiCYI0V4r46mokMj0rwDzOsbgXnYeSec33ZzChMorcnJBZ6JFRe0N
   IiygUdHwmQqUqikD/40+WZVdzTx+7k3w+kS+ZqDb4VbwCoR7SWaTlNm3p
   fLIdf/06mrRVHb77TKNEE9POjoP79on9pB5fMuzUtm4dwanFZkpl/N7Jc
   9JrzMwoH5x+qCJaEli8qavSJsm5s6QjZYk5idFFocH5P4RHn8/dJYBOHK
   m4Cd/RTQgaYmYoxt/xrz52+xj0iH0YDr59aiqZMU2uIyflU8ChONBSgK+
   Q==;
X-IronPort-AV: E=McAfee;i="6500,9779,10630"; a="332167711"
X-IronPort-AV: E=Sophos;i="5.97,324,1669104000";
   d="scan'208";a="332167711"
Received: from fmsmga004.fm.intel.com ([10.253.24.48])
  by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 24 Feb 2023 06:12:19 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10630"; a="741684659"
X-IronPort-AV: E=Sophos;i="5.97,324,1669104000";
   d="scan'208";a="741684659"
Received: from bingqili-mobl2.ccr.corp.intel.com (HELO
 yhuang6-mobl2.ccr.corp.intel.com) ([10.255.28.19])
  by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 24 Feb 2023 06:12:15 -0800
From: Huang Ying <ying.huang@intel.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
        Huang Ying <ying.huang@intel.com>,
        Hugh Dickins <hughd@google.com>,
        "Xu, Pengfei" <pengfei.xu@intel.com>,
        Christoph Hellwig <hch@lst.de>,
        Stefan Roesch <shr@devkernel.io>, Tejun Heo <tj@kernel.org>,
        Xin Hao <xhao@linux.alibaba.com>, Zi Yan <ziy@nvidia.com>,
        Yang Shi <shy828301@gmail.com>,
        Baolin Wang <baolin.wang@linux.alibaba.com>,
        Matthew Wilcox <willy@infradead.org>,
        Mike Kravetz <mike.kravetz@oracle.com>
Subject: [PATCH 2/3] migrate_pages: move split folios processing out of
 migrate_pages_batch()
Date: Fri, 24 Feb 2023 22:11:44 +0800
Message-Id: <20230224141145.96814-3-ying.huang@intel.com>
X-Mailer: git-send-email 2.39.1
In-Reply-To: <20230224141145.96814-1-ying.huang@intel.com>
References: <20230224141145.96814-1-ying.huang@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

To simplify the code logic and reduce the line number.

Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: "Xu, Pengfei" <pengfei.xu@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Stefan Roesch <shr@devkernel.io>
Cc: Tejun Heo <tj@kernel.org>
Cc: Xin Hao <xhao@linux.alibaba.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
 mm/migrate.c | 76 ++++++++++++++++++----------------------------------
 1 file changed, 26 insertions(+), 50 deletions(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index 7ac37dbbf307..91198b487e49 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1605,9 +1605,10 @@ static int migrate_hugetlbs(struct list_head *from, =
new_page_t get_new_page,
 static int migrate_pages_batch(struct list_head *from, new_page_t get_new_=
page,
 		free_page_t put_new_page, unsigned long private,
 		enum migrate_mode mode, int reason, struct list_head *ret_folios,
-		struct migrate_pages_stats *stats)
+		struct list_head *split_folios, struct migrate_pages_stats *stats,
+		int nr_pass)
 {
-	int retry;
+	int retry =3D 1;
 	int large_retry =3D 1;
 	int thp_retry =3D 1;
 	int nr_failed =3D 0;
@@ -1617,19 +1618,12 @@ static int migrate_pages_batch(struct list_head *fr=
om, new_page_t get_new_page,
 	bool is_large =3D false;
 	bool is_thp =3D false;
 	struct folio *folio, *folio2, *dst =3D NULL, *dst2;
-	int rc, rc_saved, nr_pages;
-	LIST_HEAD(split_folios);
+	int rc, rc_saved =3D 0, nr_pages;
 	LIST_HEAD(unmap_folios);
 	LIST_HEAD(dst_folios);
 	bool nosplit =3D (reason =3D=3D MR_NUMA_MISPLACED);
-	bool no_split_folio_counting =3D false;
=20
-retry:
-	rc_saved =3D 0;
-	retry =3D 1;
-	for (pass =3D 0;
-	     pass < NR_MAX_MIGRATE_PAGES_RETRY && (retry || large_retry);
-	     pass++) {
+	for (pass =3D 0; pass < nr_pass && (retry || large_retry); pass++) {
 		retry =3D 0;
 		large_retry =3D 0;
 		thp_retry =3D 0;
@@ -1660,7 +1654,7 @@ static int migrate_pages_batch(struct list_head *from=
, new_page_t get_new_page,
 			if (!thp_migration_supported() && is_thp) {
 				nr_large_failed++;
 				stats->nr_thp_failed++;
-				if (!try_split_folio(folio, &split_folios)) {
+				if (!try_split_folio(folio, split_folios)) {
 					stats->nr_thp_split++;
 					continue;
 				}
@@ -1692,7 +1686,7 @@ static int migrate_pages_batch(struct list_head *from=
, new_page_t get_new_page,
 					stats->nr_thp_failed +=3D is_thp;
 					/* Large folio NUMA faulting doesn't split to retry. */
 					if (!nosplit) {
-						int ret =3D try_split_folio(folio, &split_folios);
+						int ret =3D try_split_folio(folio, split_folios);
=20
 						if (!ret) {
 							stats->nr_thp_split +=3D is_thp;
@@ -1709,18 +1703,11 @@ static int migrate_pages_batch(struct list_head *fr=
om, new_page_t get_new_page,
 							break;
 						}
 					}
-				} else if (!no_split_folio_counting) {
+				} else {
 					nr_failed++;
 				}
=20
 				stats->nr_failed_pages +=3D nr_pages + nr_retry_pages;
-				/*
-				 * There might be some split folios of fail-to-migrate large
-				 * folios left in split_folios list. Move them to ret_folios
-				 * list so that they could be put back to the right list by
-				 * the caller otherwise the folio refcnt will be leaked.
-				 */
-				list_splice_init(&split_folios, ret_folios);
 				/* nr_failed isn't updated for not used */
 				nr_large_failed +=3D large_retry;
 				stats->nr_thp_failed +=3D thp_retry;
@@ -1733,7 +1720,7 @@ static int migrate_pages_batch(struct list_head *from=
, new_page_t get_new_page,
 				if (is_large) {
 					large_retry++;
 					thp_retry +=3D is_thp;
-				} else if (!no_split_folio_counting) {
+				} else {
 					retry++;
 				}
 				nr_retry_pages +=3D nr_pages;
@@ -1756,7 +1743,7 @@ static int migrate_pages_batch(struct list_head *from=
, new_page_t get_new_page,
 				if (is_large) {
 					nr_large_failed++;
 					stats->nr_thp_failed +=3D is_thp;
-				} else if (!no_split_folio_counting) {
+				} else {
 					nr_failed++;
 				}
=20
@@ -1774,9 +1761,7 @@ static int migrate_pages_batch(struct list_head *from=
, new_page_t get_new_page,
 	try_to_unmap_flush();
=20
 	retry =3D 1;
-	for (pass =3D 0;
-	     pass < NR_MAX_MIGRATE_PAGES_RETRY && (retry || large_retry);
-	     pass++) {
+	for (pass =3D 0; pass < nr_pass && (retry || large_retry); pass++) {
 		retry =3D 0;
 		large_retry =3D 0;
 		thp_retry =3D 0;
@@ -1805,7 +1790,7 @@ static int migrate_pages_batch(struct list_head *from=
, new_page_t get_new_page,
 				if (is_large) {
 					large_retry++;
 					thp_retry +=3D is_thp;
-				} else if (!no_split_folio_counting) {
+				} else {
 					retry++;
 				}
 				nr_retry_pages +=3D nr_pages;
@@ -1818,7 +1803,7 @@ static int migrate_pages_batch(struct list_head *from=
, new_page_t get_new_page,
 				if (is_large) {
 					nr_large_failed++;
 					stats->nr_thp_failed +=3D is_thp;
-				} else if (!no_split_folio_counting) {
+				} else {
 					nr_failed++;
 				}
=20
@@ -1855,27 +1840,6 @@ static int migrate_pages_batch(struct list_head *fro=
m, new_page_t get_new_page,
 		dst2 =3D list_next_entry(dst, lru);
 	}
=20
-	/*
-	 * Try to migrate split folios of fail-to-migrate large folios, no
-	 * nr_failed counting in this round, since all split folios of a
-	 * large folio is counted as 1 failure in the first round.
-	 */
-	if (rc >=3D 0 && !list_empty(&split_folios)) {
-		/*
-		 * Move non-migrated folios (after NR_MAX_MIGRATE_PAGES_RETRY
-		 * retries) to ret_folios to avoid migrating them again.
-		 */
-		list_splice_init(from, ret_folios);
-		list_splice_init(&split_folios, from);
-		/*
-		 * Force async mode to avoid to wait lock or bit when we have
-		 * locked more than one folios.
-		 */
-		mode =3D MIGRATE_ASYNC;
-		no_split_folio_counting =3D true;
-		goto retry;
-	}
-
 	return rc;
 }
=20
@@ -1914,6 +1878,7 @@ int migrate_pages(struct list_head *from, new_page_t =
get_new_page,
 	struct folio *folio, *folio2;
 	LIST_HEAD(folios);
 	LIST_HEAD(ret_folios);
+	LIST_HEAD(split_folios);
 	struct migrate_pages_stats stats;
=20
 	trace_mm_migrate_pages_start(mode, reason);
@@ -1947,12 +1912,23 @@ int migrate_pages(struct list_head *from, new_page_=
t get_new_page,
 	else
 		list_splice_init(from, &folios);
 	rc =3D migrate_pages_batch(&folios, get_new_page, put_new_page, private,
-				 mode, reason, &ret_folios, &stats);
+				 mode, reason, &ret_folios, &split_folios, &stats,
+				 NR_MAX_MIGRATE_PAGES_RETRY);
 	list_splice_tail_init(&folios, &ret_folios);
 	if (rc < 0) {
 		rc_gather =3D rc;
+		list_splice_tail(&split_folios, &ret_folios);
 		goto out;
 	}
+	if (!list_empty(&split_folios)) {
+		/*
+		 * Failure isn't counted since all split folios of a large folio
+		 * is counted as 1 failure already.
+		 */
+		migrate_pages_batch(&split_folios, get_new_page, put_new_page, private,
+				    MIGRATE_ASYNC, reason, &ret_folios, NULL, &stats, 1);
+		list_splice_tail_init(&split_folios, &ret_folios);
+	}
 	rc_gather +=3D rc;
 	if (!list_empty(from))
 		goto again;
--=20
2.39.1
From nobody Sun Feb  8 15:48:23 2026
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 8D82CC61DA3
	for <linux-kernel@archiver.kernel.org>; Fri, 24 Feb 2023 14:12:39 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S229904AbjBXOMi (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 24 Feb 2023 09:12:38 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33704 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S229888AbjBXOM2 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 24 Feb 2023 09:12:28 -0500
Received: from mga04.intel.com (mga04.intel.com [192.55.52.120])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1C87169AF6
        for <linux-kernel@vger.kernel.org>;
 Fri, 24 Feb 2023 06:12:23 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677247943; x=1708783943;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=Dxby7yNPzwalm19ztka4+4UYeHwqmfCE4JJYZx6x6X0=;
  b=FVryWDEezEQUTH30h6wK5CQXpb+8MAttxCZ+JGNizUmjiL8DxHqOj+WP
   VBjCfKNgok2ybcUXSCGhJZXAXaUKm+ifcki8ZlyXpWRDnmnlTyl46ljXv
   TTCSq6zXrpkJ22vs/D74lnPAAFHpa7cnrPl8EoG3Ob1w3BOBBdILPbOXW
   +3xX6swV7hmPWYJl2JR0TZ3PP851aCjtez6HN7kHidufiiYcmckHngEFc
   UFWPHH6coAqGwLGmezIvatptI+2Jk50CWvrSnGwo5oyn8fWUUIlZvuFOl
   Mc1vuGdATYGNA0h5gOGvEQn9ByuwQam/2O7ogPRnZ2MGTsQhFXWveq5nt
   A==;
X-IronPort-AV: E=McAfee;i="6500,9779,10630"; a="332167736"
X-IronPort-AV: E=Sophos;i="5.97,324,1669104000";
   d="scan'208";a="332167736"
Received: from fmsmga004.fm.intel.com ([10.253.24.48])
  by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 24 Feb 2023 06:12:22 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10630"; a="741684670"
X-IronPort-AV: E=Sophos;i="5.97,324,1669104000";
   d="scan'208";a="741684670"
Received: from bingqili-mobl2.ccr.corp.intel.com (HELO
 yhuang6-mobl2.ccr.corp.intel.com) ([10.255.28.19])
  by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 24 Feb 2023 06:12:19 -0800
From: Huang Ying <ying.huang@intel.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
        Huang Ying <ying.huang@intel.com>,
        Hugh Dickins <hughd@google.com>,
        "Xu, Pengfei" <pengfei.xu@intel.com>,
        Christoph Hellwig <hch@lst.de>,
        Stefan Roesch <shr@devkernel.io>, Tejun Heo <tj@kernel.org>,
        Xin Hao <xhao@linux.alibaba.com>, Zi Yan <ziy@nvidia.com>,
        Yang Shi <shy828301@gmail.com>,
        Baolin Wang <baolin.wang@linux.alibaba.com>,
        Matthew Wilcox <willy@infradead.org>,
        Mike Kravetz <mike.kravetz@oracle.com>
Subject: [PATCH 3/3] migrate_pages: try migrate in batch asynchronously
 firstly
Date: Fri, 24 Feb 2023 22:11:45 +0800
Message-Id: <20230224141145.96814-4-ying.huang@intel.com>
X-Mailer: git-send-email 2.39.1
In-Reply-To: <20230224141145.96814-1-ying.huang@intel.com>
References: <20230224141145.96814-1-ying.huang@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

When we have locked more than one folios, we cannot wait the lock or
bit (e.g., page lock, buffer head lock, writeback bit) synchronously.
Otherwise deadlock may be triggered.  This make it hard to batch the
synchronous migration directly.

This patch re-enables batching synchronous migration via trying to
migrate in batch asynchronously firstly.  And any folios that are
failed to be migrated asynchronously will be migrated synchronously
one by one.

Test shows that this can restore the TLB flushing batching performance
for synchronous migration effectively.

Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: "Xu, Pengfei" <pengfei.xu@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Stefan Roesch <shr@devkernel.io>
Cc: Tejun Heo <tj@kernel.org>
Cc: Xin Hao <xhao@linux.alibaba.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Tested-by: Hugh Dickins <hughd@google.com>
---
 mm/migrate.c | 65 ++++++++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 55 insertions(+), 10 deletions(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index 91198b487e49..c17ce5ee8d92 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1843,6 +1843,51 @@ static int migrate_pages_batch(struct list_head *fro=
m, new_page_t get_new_page,
 	return rc;
 }
=20
+static int migrate_pages_sync(struct list_head *from, new_page_t get_new_p=
age,
+		free_page_t put_new_page, unsigned long private,
+		enum migrate_mode mode, int reason, struct list_head *ret_folios,
+		struct list_head *split_folios, struct migrate_pages_stats *stats)
+{
+	int rc, nr_failed =3D 0;
+	LIST_HEAD(folios);
+	struct migrate_pages_stats astats;
+
+	memset(&astats, 0, sizeof(astats));
+	/* Try to migrate in batch with MIGRATE_ASYNC mode firstly */
+	rc =3D migrate_pages_batch(from, get_new_page, put_new_page, private, MIG=
RATE_ASYNC,
+				 reason, &folios, split_folios, &astats,
+				 NR_MAX_MIGRATE_PAGES_RETRY);
+	stats->nr_succeeded +=3D astats.nr_succeeded;
+	stats->nr_thp_succeeded +=3D astats.nr_thp_succeeded;
+	stats->nr_thp_split +=3D astats.nr_thp_split;
+	if (rc < 0) {
+		stats->nr_failed_pages +=3D astats.nr_failed_pages;
+		stats->nr_thp_failed +=3D astats.nr_thp_failed;
+		list_splice_tail(&folios, ret_folios);
+		return rc;
+	}
+	stats->nr_thp_failed +=3D astats.nr_thp_split;
+	nr_failed +=3D astats.nr_thp_split;
+	/*
+	 * Fall back to migrate all failed folios one by one synchronously. All
+	 * failed folios except split THPs will be retried, so their failure
+	 * isn't counted
+	 */
+	list_splice_tail_init(&folios, from);
+	while (!list_empty(from)) {
+		list_move(from->next, &folios);
+		rc =3D migrate_pages_batch(&folios, get_new_page, put_new_page,
+					 private, mode, reason, ret_folios,
+					 split_folios, stats, NR_MAX_MIGRATE_PAGES_RETRY);
+		list_splice_tail_init(&folios, ret_folios);
+		if (rc < 0)
+			return rc;
+		nr_failed +=3D rc;
+	}
+
+	return nr_failed;
+}
+
 /*
  * migrate_pages - migrate the folios specified in a list, to the free fol=
ios
  *		   supplied as the target for the page migration
@@ -1874,7 +1919,7 @@ int migrate_pages(struct list_head *from, new_page_t =
get_new_page,
 		enum migrate_mode mode, int reason, unsigned int *ret_succeeded)
 {
 	int rc, rc_gather;
-	int nr_pages, batch;
+	int nr_pages;
 	struct folio *folio, *folio2;
 	LIST_HEAD(folios);
 	LIST_HEAD(ret_folios);
@@ -1890,10 +1935,6 @@ int migrate_pages(struct list_head *from, new_page_t=
 get_new_page,
 	if (rc_gather < 0)
 		goto out;
=20
-	if (mode =3D=3D MIGRATE_ASYNC)
-		batch =3D NR_MAX_BATCHED_MIGRATION;
-	else
-		batch =3D 1;
 again:
 	nr_pages =3D 0;
 	list_for_each_entry_safe(folio, folio2, from, lru) {
@@ -1904,16 +1945,20 @@ int migrate_pages(struct list_head *from, new_page_=
t get_new_page,
 		}
=20
 		nr_pages +=3D folio_nr_pages(folio);
-		if (nr_pages >=3D batch)
+		if (nr_pages >=3D NR_MAX_BATCHED_MIGRATION)
 			break;
 	}
-	if (nr_pages >=3D batch)
+	if (nr_pages >=3D NR_MAX_BATCHED_MIGRATION)
 		list_cut_before(&folios, from, &folio2->lru);
 	else
 		list_splice_init(from, &folios);
-	rc =3D migrate_pages_batch(&folios, get_new_page, put_new_page, private,
-				 mode, reason, &ret_folios, &split_folios, &stats,
-				 NR_MAX_MIGRATE_PAGES_RETRY);
+	if (mode =3D=3D MIGRATE_ASYNC)
+		rc =3D migrate_pages_batch(&folios, get_new_page, put_new_page, private,
+					 mode, reason, &ret_folios, &split_folios, &stats,
+					 NR_MAX_MIGRATE_PAGES_RETRY);
+	else
+		rc =3D migrate_pages_sync(&folios, get_new_page, put_new_page, private,
+					mode, reason, &ret_folios, &split_folios, &stats);
 	list_splice_tail_init(&folios, &ret_folios);
 	if (rc < 0) {
 		rc_gather =3D rc;
--=20
2.39.1