From nobody Wed Dec 31 10:03:30 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 03E04C4332F for ; Sun, 5 Nov 2023 12:51:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229661AbjKEMvs (ORCPT ); Sun, 5 Nov 2023 07:51:48 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38532 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229436AbjKEMvq (ORCPT ); Sun, 5 Nov 2023 07:51:46 -0500 Received: from mx0b-0031df01.pphosted.com (mx0b-0031df01.pphosted.com [205.220.180.131]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 725C7E0 for ; Sun, 5 Nov 2023 04:51:43 -0800 (PST) Received: from pps.filterd (m0279873.ppops.net [127.0.0.1]) by mx0a-0031df01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 3A5CZlcl017538; Sun, 5 Nov 2023 12:51:32 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=quicinc.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-type; s=qcppdkim1; bh=12MQflBjL4Gqb+axG0wpE/BY/zW7nOj/OH6uPabUueU=; b=lgOUV9fg8MXUnJKK390DF9w4DEC3UYUf3R8T/5B0Yku5IaEQML0p+XHMpsdnIX5ou7gt zyrgzHIelzSz2D/TpmXKCzHryPjAMJeVZWO5+vi659QGPuC9GizMBK/YJMTW3bR1u5I3 rqdMtL0s6eijef4oi0SMIisCkCsZThLTPEc4q4CNGHpE3cOAAnPdwLce7qqICnynGdZ6 JDN/jz82MWxuBDTFEXoXoNHuCHj+WEkoK0QQCe+j2YcAUr4sGog7WhZcI0UIBMFUPnnA +I3dnlH7sKU3O+9EMpp6vMipafKW2ee0kKe3XzwjC0Zd+ByE1UHDM6SjtrMvwUQKSTwE rA== Received: from nalasppmta01.qualcomm.com (Global_NAT1.qualcomm.com [129.46.96.20]) by mx0a-0031df01.pphosted.com (PPS) with ESMTPS id 3u5ernhya0-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Sun, 05 Nov 2023 12:51:31 +0000 Received: from nalasex01a.na.qualcomm.com (nalasex01a.na.qualcomm.com [10.47.209.196]) by NALASPPMTA01.qualcomm.com (8.17.1.5/8.17.1.5) with ESMTPS id 3A5CpUxM030773 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Sun, 5 Nov 2023 12:51:30 GMT Received: from hu-charante-hyd.qualcomm.com (10.80.80.8) by nalasex01a.na.qualcomm.com (10.47.209.196) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.39; Sun, 5 Nov 2023 04:51:27 -0800 From: Charan Teja Kalla To: , , , , , , CC: , , Charan Teja Kalla Subject: [PATCH V2 1/3] mm: page_alloc: unreserve highatomic page blocks before oom Date: Sun, 5 Nov 2023 18:20:48 +0530 Message-ID: <301b193fcc3e1f91ef30f19ceca06dd6e00b35e1.1699104759.git.quic_charante@quicinc.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: References: MIME-Version: 1.0 X-Originating-IP: [10.80.80.8] X-ClientProxiedBy: nasanex01b.na.qualcomm.com (10.46.141.250) To nalasex01a.na.qualcomm.com (10.47.209.196) X-QCInternal: smtphost X-Proofpoint-Virus-Version: vendor=nai engine=6200 definitions=5800 signatures=585085 X-Proofpoint-ORIG-GUID: W1coWO3tY08OTjTt_w7lR7qf7TvlIyC7 X-Proofpoint-GUID: W1coWO3tY08OTjTt_w7lR7qf7TvlIyC7 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.987,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2023-11-05_10,2023-11-02_03,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 impostorscore=0 bulkscore=0 phishscore=0 priorityscore=1501 mlxlogscore=926 malwarescore=0 spamscore=0 mlxscore=0 suspectscore=0 clxscore=1015 adultscore=0 lowpriorityscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2310240000 definitions=main-2311050112 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" __alloc_pages_direct_reclaim() is called from slowpath allocation where high atomic reserves can be unreserved after there is a progress in reclaim and yet no suitable page is found. Later should_reclaim_retry() gets called from slow path allocation to decide if the reclaim needs to be retried before OOM kill path is taken. should_reclaim_retry() checks the available(reclaimable + free pages) memory against the min wmark levels of a zone and returns: a) true, if it is above the min wmark so that slow path allocation will do the reclaim retries. b) false, thus slowpath allocation takes oom kill path. should_reclaim_retry() can also unreserves the high atomic reserves **but only after all the reclaim retries are exhausted.** In a case where there are almost none reclaimable memory and free pages contains mostly the high atomic reserves but allocation context can't use these high atomic reserves, makes the available memory below min wmark levels hence false is returned from should_reclaim_retry() leading the allocation request to take OOM kill path. This can turn into a early oom kill if high atomic reserves are holding lot of free memory and unreserving of them is not attempted. (early)OOM is encountered on a VM with the below state: [ 295.998653] Normal free:7728kB boost:0kB min:804kB low:1004kB high:1204kB reserved_highatomic:8192KB active_anon:4kB inactive_anon:0kB active_file:24kB inactive_file:24kB unevictable:1220kB writepending:0kB present:70732kB managed:49224kB mlocked:0kB bounce:0kB free_pcp:688kB local_pcp:492kB free_cma:0kB [ 295.998656] lowmem_reserve[]: 0 32 [ 295.998659] Normal: 508*4kB (UMEH) 241*8kB (UMEH) 143*16kB (UMEH) 33*32kB (UH) 7*64kB (UH) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB =3D 7752kB Per above log, the free memory of ~7MB exist in the high atomic reserves is not freed up before falling back to oom kill path. Fix it by trying to unreserve the high atomic reserves in should_reclaim_retry() before __alloc_pages_direct_reclaim() can fallback to oom kill path. Fixes: 0aaa29a56e4f ("mm, page_alloc: reserve pageblocks for high-order ato= mic allocations on demand") Reported-by: Chris Goldsworthy Suggested-by: Michal Hocko Signed-off-by: Charan Teja Kalla Acked-by: Michal Hocko --- mm/page_alloc.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 95546f3..e07a38f 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3809,14 +3809,9 @@ should_reclaim_retry(gfp_t gfp_mask, unsigned order, else (*no_progress_loops)++; =20 - /* - * Make sure we converge to OOM if we cannot make any progress - * several times in the row. - */ - if (*no_progress_loops > MAX_RECLAIM_RETRIES) { - /* Before OOM, exhaust highatomic_reserve */ - return unreserve_highatomic_pageblock(ac, true); - } + if (*no_progress_loops > MAX_RECLAIM_RETRIES) + goto out; + =20 /* * Keep reclaiming pages while there is a chance this will lead @@ -3859,6 +3854,11 @@ should_reclaim_retry(gfp_t gfp_mask, unsigned order, schedule_timeout_uninterruptible(1); else cond_resched(); +out: + /* Before OOM, exhaust highatomic_reserve */ + if (!ret) + return unreserve_highatomic_pageblock(ac, true); + return ret; } =20 --=20 2.7.4 From nobody Wed Dec 31 10:03:30 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5C405C4167D for ; Sun, 5 Nov 2023 12:52:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229843AbjKEMwG (ORCPT ); Sun, 5 Nov 2023 07:52:06 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41236 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229889AbjKEMwC (ORCPT ); Sun, 5 Nov 2023 07:52:02 -0500 Received: from mx0a-0031df01.pphosted.com (mx0a-0031df01.pphosted.com [205.220.168.131]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9E11DD45 for ; Sun, 5 Nov 2023 04:51:54 -0800 (PST) Received: from pps.filterd (m0279863.ppops.net [127.0.0.1]) by mx0a-0031df01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 3A5CoBU8015431; Sun, 5 Nov 2023 12:51:35 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=quicinc.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-type; s=qcppdkim1; bh=L7T8n6YctyotXTtcQwgsAPEnOMWU1sAeWRbOtIyFLJc=; b=kYZutI9dgILFSgSbTs+KKpLizroDx766+HIgl9KQTLIyYh9Ej/kMvLdg0P0EehgyqicR wrJ6NbbIUo4MJ5h2f0Y3+eAFsHf8Ap/sAApc4tkX02yXKSw5Sl7FozftEqPtCFimOHcW fHlPe4xDeFPDverK0tTlWIQ9l7p4E8LAFbv2KPU/P9i+W4gT5PfKsHtH5W7hMRpkG57p jaAu71CdPf1C8tQe6qDQ72dY0o1oEu8OYL8N0QcFkYTVz1fuMJ3BsyECVciX+5Ps6OY3 DOLcgdaoJa+MmzSUZUC84s/DqHOz7sX4FNox0sBTJW4qEWWuewVIZh46CzpcYtxbieCu pQ== Received: from nalasppmta02.qualcomm.com (Global_NAT1.qualcomm.com [129.46.96.20]) by mx0a-0031df01.pphosted.com (PPS) with ESMTPS id 3u5efyj0k2-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Sun, 05 Nov 2023 12:51:35 +0000 Received: from nalasex01a.na.qualcomm.com (nalasex01a.na.qualcomm.com [10.47.209.196]) by NALASPPMTA02.qualcomm.com (8.17.1.5/8.17.1.5) with ESMTPS id 3A5CpYtL021095 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Sun, 5 Nov 2023 12:51:34 GMT Received: from hu-charante-hyd.qualcomm.com (10.80.80.8) by nalasex01a.na.qualcomm.com (10.47.209.196) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.39; Sun, 5 Nov 2023 04:51:30 -0800 From: Charan Teja Kalla To: , , , , , , CC: , , Charan Teja Kalla Subject: [PATCH V2 2/3] mm: page_alloc: correct high atomic reserve calculations Date: Sun, 5 Nov 2023 18:20:49 +0530 Message-ID: <905d99651423ee85aeb7a71982b95ee9bb05ee99.1699104759.git.quic_charante@quicinc.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: References: MIME-Version: 1.0 X-Originating-IP: [10.80.80.8] X-ClientProxiedBy: nasanex01b.na.qualcomm.com (10.46.141.250) To nalasex01a.na.qualcomm.com (10.47.209.196) X-QCInternal: smtphost X-Proofpoint-Virus-Version: vendor=nai engine=6200 definitions=5800 signatures=585085 X-Proofpoint-GUID: DZYnG6HqyGY33Ef0z35NJ3u-Y8Sot20S X-Proofpoint-ORIG-GUID: DZYnG6HqyGY33Ef0z35NJ3u-Y8Sot20S X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.987,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2023-11-05_10,2023-11-02_03,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 impostorscore=0 mlxlogscore=565 priorityscore=1501 malwarescore=0 mlxscore=0 clxscore=1015 phishscore=0 adultscore=0 suspectscore=0 lowpriorityscore=0 spamscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2310240000 definitions=main-2311050112 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" reserve_highatomic_pageblock() aims to reserve the 1% of the managed pages of a zone, which is used for the high order atomic allocations. It uses the below calculation to reserve: static void reserve_highatomic_pageblock(struct page *page, ....) { ....... max_managed =3D (zone_managed_pages(zone) / 100) + pageblock_nr_pages; if (zone->nr_reserved_highatomic >=3D max_managed) goto out; zone->nr_reserved_highatomic +=3D pageblock_nr_pages; set_pageblock_migratetype(page, MIGRATE_HIGHATOMIC); move_freepages_block(zone, page, MIGRATE_HIGHATOMIC, NULL); out: .... } Since we are always appending the 1% of zone managed pages count to pageblock_nr_pages, the minimum it is turning into 2 pageblocks as the nr_reserved_highatomic is incremented/decremented in pageblock sizes. Encountered a system(actually a VM running on the Linux kernel) with the below zone configuration: Normal free:7728kB boost:0kB min:804kB low:1004kB high:1204kB reserved_highatomic:8192KB managed:49224kB The existing calculations making it to reserve the 8MB(with pageblock size of 4MB) i.e. 16% of the zone managed memory. Reserving such high amount of memory can easily exert memory pressure in the system thus may lead into unnecessary reclaims till unreserving of high atomic reserves. Since high atomic reserves are managed in pageblock size granules, as MIGRATE_HIGHATOMIC is set for such pageblock, fix the calculations for high atomic reserves as, minimum is pageblock size , maximum is approximately 1% of the zone managed pages. Signed-off-by: Charan Teja Kalla Acked-by: Mel Gorman --- mm/page_alloc.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index e07a38f..b91c99e 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1883,10 +1883,11 @@ static void reserve_highatomic_pageblock(struct pag= e *page, struct zone *zone) unsigned long max_managed, flags; =20 /* - * Limit the number reserved to 1 pageblock or roughly 1% of a zone. + * The number reserved as: minimum is 1 pageblock, maximum is + * roughly 1% of a zone. * Check is race-prone but harmless. */ - max_managed =3D (zone_managed_pages(zone) / 100) + pageblock_nr_pages; + max_managed =3D ALIGN((zone_managed_pages(zone) / 100), pageblock_nr_page= s); if (zone->nr_reserved_highatomic >=3D max_managed) return; =20 --=20 2.7.4 From nobody Wed Dec 31 10:03:30 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4D657C4332F for ; Sun, 5 Nov 2023 12:52:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229729AbjKEMvz (ORCPT ); Sun, 5 Nov 2023 07:51:55 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36052 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229621AbjKEMvx (ORCPT ); Sun, 5 Nov 2023 07:51:53 -0500 Received: from mx0b-0031df01.pphosted.com (mx0b-0031df01.pphosted.com [205.220.180.131]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C9D4D134 for ; Sun, 5 Nov 2023 04:51:49 -0800 (PST) Received: from pps.filterd (m0279868.ppops.net [127.0.0.1]) by mx0a-0031df01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 3A5CpdQj008163; Sun, 5 Nov 2023 12:51:39 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=quicinc.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-type; s=qcppdkim1; bh=alGu7DSQCg8vRy4ahagVkIN6teU4ANxGr1I6alpYOtM=; b=NorFpvXY9dJEINV4nDW+arIx2FvGaQ0ea+EEAsFvd1nWcHxgAeC50zlXKugod0ACOJrG KYrcGYZ8lTov3U74LwpkCciOalMukfFL85nMmWTijoL+Bu0kVrXYyICFTLaH8odZsSaq SyC5vkwYgiivFqaDV/JQwb8BJuo1Cr9463PWOjnr67/RWheQnakY1XWVkMTwNOSLtpaA Evq9t5f/2wryeHPvDSwx5BCG08GdUlKO82ktTGu8pVvhyuU047ydGDexwz3TidITMNz7 c2RWPwrrcHhxvke3mSf35op21s2JqBhWAOLYtHUjtAATrNUjlNFcLT7iqa7qUXO46eWs IQ== Received: from nalasppmta05.qualcomm.com (Global_NAT1.qualcomm.com [129.46.96.20]) by mx0a-0031df01.pphosted.com (PPS) with ESMTPS id 3u5ek4hyjp-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Sun, 05 Nov 2023 12:51:39 +0000 Received: from nalasex01a.na.qualcomm.com (nalasex01a.na.qualcomm.com [10.47.209.196]) by NALASPPMTA05.qualcomm.com (8.17.1.5/8.17.1.5) with ESMTPS id 3A5CpcAl021658 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Sun, 5 Nov 2023 12:51:38 GMT Received: from hu-charante-hyd.qualcomm.com (10.80.80.8) by nalasex01a.na.qualcomm.com (10.47.209.196) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.39; Sun, 5 Nov 2023 04:51:34 -0800 From: Charan Teja Kalla To: , , , , , , CC: , , Charan Teja Kalla Subject: [PATCH V3 3/3] mm: page_alloc: drain pcp lists before oom kill Date: Sun, 5 Nov 2023 18:20:50 +0530 Message-ID: X-Mailer: git-send-email 2.7.4 In-Reply-To: References: MIME-Version: 1.0 X-Originating-IP: [10.80.80.8] X-ClientProxiedBy: nasanex01b.na.qualcomm.com (10.46.141.250) To nalasex01a.na.qualcomm.com (10.47.209.196) X-QCInternal: smtphost X-Proofpoint-Virus-Version: vendor=nai engine=6200 definitions=5800 signatures=585085 X-Proofpoint-GUID: ihGUMcrRY_x_DBJoNrNbYLsj8TJYXiz1 X-Proofpoint-ORIG-GUID: ihGUMcrRY_x_DBJoNrNbYLsj8TJYXiz1 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.987,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2023-11-05_10,2023-11-02_03,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 phishscore=0 impostorscore=0 malwarescore=0 adultscore=0 lowpriorityscore=0 bulkscore=0 mlxscore=0 clxscore=1015 spamscore=0 mlxlogscore=549 priorityscore=1501 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2310240000 definitions=main-2311050112 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" pcp lists are drained from __alloc_pages_direct_reclaim(), only if some progress is made in the attempt. struct page *__alloc_pages_direct_reclaim() { ..... *did_some_progress =3D __perform_reclaim(gfp_mask, order, ac); if (unlikely(!(*did_some_progress))) goto out; retry: page =3D get_page_from_freelist(); if (!page && !drained) { drain_all_pages(NULL); drained =3D true; goto retry; } out: } After the above, allocation attempt can fallback to should_reclaim_retry() to decide reclaim retries. If it too return false, allocation request will simply fallback to oom kill path without even attempting the draining of the pcp pages that might help the allocation attempt to succeed. VM system running with ~50MB of memory shown the below stats during OOM kill: Normal free:760kB boost:0kB min:768kB low:960kB high:1152kB reserved_highatomic:0KB managed:49152kB free_pcp:460kB Though in such system state OOM kill is imminent, but the current kill could have been delayed if the pcp is drained as pcp + free is even above the high watermark. Fix this missing drain of pcp list in should_reclaim_retry() along with unreserving the high atomic page blocks, like it is done in __alloc_pages_direct_reclaim(). Signed-off-by: Charan Teja Kalla --- mm/page_alloc.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index b91c99e..8eee292 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3857,8 +3857,10 @@ should_reclaim_retry(gfp_t gfp_mask, unsigned order, cond_resched(); out: /* Before OOM, exhaust highatomic_reserve */ - if (!ret) - return unreserve_highatomic_pageblock(ac, true); + if (!ret) { + ret =3D unreserve_highatomic_pageblock(ac, true); + drain_all_pages(NULL); + } =20 return ret; } --=20 2.7.4