From nobody Tue Dec 16 18:02:24 2025 Received: from esa4.hc1455-7.c3s2.iphmx.com (esa4.hc1455-7.c3s2.iphmx.com [68.232.139.117]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0239D8BFC for ; Mon, 22 Jul 2024 02:12:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=68.232.139.117 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1721614373; cv=none; b=BpZMfiQMzXnxLPrZJYu5+7PNM6b7jQ25gW/XatH4GMe2F+1xEnfG5LxM3GFsciLizHzVNSZ/4bOlOSDHQ34rb6blkwh9APTv6JiYOHr4Ua49nhPMzY/9m0sZUgBysCzlin6jx4z+wySq/20WFztV7JRz/MGgOwM8vibrpcwSMVY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1721614373; c=relaxed/simple; bh=I5DMe3EnS7G1j/NeL0/4/YwmynXJRTUqok3lFtFxT7s=; h=From:To:Cc:Subject:Date:Message-Id:MIME-Version; b=retlI/JX10J7AqKZ5+eEPRwBc8xejrpspyululqnLZosK/5hVCjYeaHrara+PrOnGYxssINxXYfzYGQZs0HHb44qp1ltaMQBP6XntyPB6ywd7rZtngbeDWJG/Ff9cLsbDYs6i/wcnyiJJw95zQdVZ+TOHv2zF8fcI1fIplkZa9o= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=fujitsu.com; spf=pass smtp.mailfrom=fujitsu.com; dkim=pass (2048-bit key) header.d=fujitsu.com header.i=@fujitsu.com header.b=rUugE5tP; arc=none smtp.client-ip=68.232.139.117 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=fujitsu.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=fujitsu.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=fujitsu.com header.i=@fujitsu.com header.b="rUugE5tP" DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=fujitsu.com; i=@fujitsu.com; q=dns/txt; s=fj2; t=1721614369; x=1753150369; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=I5DMe3EnS7G1j/NeL0/4/YwmynXJRTUqok3lFtFxT7s=; b=rUugE5tPPdI7x0RZzx7dwnOgU4cNA+KcHHX3E9iwPRTf8h2eyC2pyxIF H/ECRBk1CdhqnB8HS9PepVj6/LoyhNiwuUwiC+Ic8ZRic9jN9HltLyV5L 9t64/iAWThb/r8uqSnONRUFd8xQb8vZ0RNdKUgJLQ94Q63bfkarHVfSSw cWxGWG1Vf4SJMp1+duP/2wo6Q0zAUIRU2UFmPJ/LPaEkZjdxquOs1CKks pneSgGWt8OmPBWthScXv7TVtDdLAA2Xk6DA06QkA1oZN2xrJEILQ252ZS OK9juKICFJPEfDcN5JPAD10lUwFH3DX7853CLOf9raUL5PEqj4dVm5Ejv w==; X-IronPort-AV: E=McAfee;i="6700,10204,11140"; a="168168824" X-IronPort-AV: E=Sophos;i="6.09,227,1716217200"; d="scan'208";a="168168824" Received: from unknown (HELO oym-r4.gw.nic.fujitsu.com) ([210.162.30.92]) by esa4.hc1455-7.c3s2.iphmx.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jul 2024 11:11:21 +0900 Received: from oym-m3.gw.nic.fujitsu.com (oym-nat-oym-m3.gw.nic.fujitsu.com [192.168.87.60]) by oym-r4.gw.nic.fujitsu.com (Postfix) with ESMTP id 46DD3D800F for ; Mon, 22 Jul 2024 11:11:19 +0900 (JST) Received: from kws-ab3.gw.nic.fujitsu.com (kws-ab3.gw.nic.fujitsu.com [192.51.206.21]) by oym-m3.gw.nic.fujitsu.com (Postfix) with ESMTP id 8E4C3D7494 for ; Mon, 22 Jul 2024 11:11:18 +0900 (JST) Received: from edo.cn.fujitsu.com (edo.cn.fujitsu.com [10.167.33.5]) by kws-ab3.gw.nic.fujitsu.com (Postfix) with ESMTP id 2F28A2007CAB2 for ; Mon, 22 Jul 2024 11:11:18 +0900 (JST) Received: from localhost.localdomain (unknown [10.167.226.45]) by edo.cn.fujitsu.com (Postfix) with ESMTP id 57B821A000B; Mon, 22 Jul 2024 10:11:17 +0800 (CST) From: Li Zhijian To: linux-mm@kvack.org Cc: akpm@linux-foundation.org, linux-kernel@vger.kernel.org, Yasunori Gotou , Li Zhijian , David Hildenbrand , Vlastimil Babka , Yao Xingtao Subject: [PATCH v2] mm/page_alloc: Fix pcp->count race between drain_pages_zone() vs __rmqueue_pcplist() Date: Mon, 22 Jul 2024 10:10:59 +0800 Message-Id: <20240722021059.1076399-1-lizhijian@fujitsu.com> X-Mailer: git-send-email 2.31.1 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-TM-AS-Product-Ver: IMSS-9.1.0.1417-9.0.0.1002-28544.004 X-TM-AS-User-Approved-Sender: Yes X-TMASE-Version: IMSS-9.1.0.1417-9.0.1002-28544.004 X-TMASE-Result: 10--5.996900-10.000000 X-TMASE-MatchedRID: +R0/YEDHEDjoSitJVour/QuB7zdAMUjAl9q75JzWJRMarX6LchMkVuBw lzWEEXt2fatVQLA534yNMkq6FfSn6lnGEjlsas2yFDuTLTe6zcNMkOX0UoduuV7V7de6UnlgmKb hu5KaCkf9F5gpB/8TUo2MogdbmQhJWSEm/dnndoSdVNZaI2n6//SzAdIVxUno2vch1fMqmI8mIm l+ywrqvklEFjVj/aAsbncztPPsTqsv+0FNnM7lDRVqL8+WwS7muhv94WF6cmmm04TWLzKiuBhBv WgZlX+84vM1YF6AJbbCCfuIMF6xLSAHAopEd76vdp8SlsBStysjUE3BmlSs165ZdugdSt2/9b4c 9nQDp+sRP1XlvFsUag== X-TMASE-SNAP-Result: 1.821001.0001-0-1-22:0,33:0,34:0-0 Content-Type: text/plain; charset="utf-8" It's expected that no page should be left in pcp_list after calling zone_pcp_disable() in offline_pages(). Previously, it's observed that offline_pages() gets stuck [1] due to some pages remaining in pcp_list. Cause: There is a race condition between drain_pages_zone() and __rmqueue_pcplist() involving the pcp->count variable. See below scenario: CPU0 CPU1 ---------------- --------------- spin_lock(&pcp->lock); __rmqueue_pcplist() { zone_pcp_disable() { /* list is empty */ if (list_empty(list)) { /* add pages to pcp_list */ alloced =3D rmqueue_bulk() mutex_lock(&pcp_batch_high_lock) ... __drain_all_pages() { drain_pages_zone() { /* read pcp->count, it's 0 here */ count =3D READ_ONCE(pcp->count) /* 0 means nothing to drain */ /* update pcp->count */ pcp->count +=3D alloced << order; ... ... spin_unlock(&pcp->lock); In this case, after calling zone_pcp_disable() though, there are still some pages in pcp_list. And these pages in pcp_list are neither movable nor isolated, offline_pages() gets stuck as a result. Solution: Expand the scope of the pcp->lock to also protect pcp->count in drain_pages_zone(), to ensure no pages are left in the pcp list after zone_pcp_disable() [1] https://lore.kernel.org/linux-mm/6a07125f-e720-404c-b2f9-e55f3f166e85@f= ujitsu.com/ Cc: David Hildenbrand Cc: Vlastimil Babka (SUSE) Reported-by: Yao Xingtao Signed-off-by: Li Zhijian --- V2: - Narrow down the scope of the spin_lock() to limit the draining latenc= y. # Vlastimil and David - In above scenario, it's sufficient to read pcp->count once with lock = held, and it fully fixed my issue[1] in thounds runs(It happened in more than 5% before). RFC: https://lore.kernel.org/linux-mm/20240716073929.843277-1-lizhijian@fuji= tsu.com/ --- mm/page_alloc.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 9ecf99190ea2..5388a35c4e9c 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2323,8 +2323,11 @@ void drain_zone_pages(struct zone *zone, struct per_= cpu_pages *pcp) static void drain_pages_zone(unsigned int cpu, struct zone *zone) { struct per_cpu_pages *pcp =3D per_cpu_ptr(zone->per_cpu_pageset, cpu); - int count =3D READ_ONCE(pcp->count); + int count; =20 + spin_lock(&pcp->lock); + count =3D pcp->count; + spin_unlock(&pcp->lock); while (count) { int to_drain =3D min(count, pcp->batch << CONFIG_PCP_BATCH_SCALE_MAX); count -=3D to_drain; --=20 2.29.2