From nobody Fri Dec 19 12:20:08 2025 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C102914D43D for ; Wed, 28 Aug 2024 07:25:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=198.175.65.17 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724829918; cv=fail; b=NtqaHl8a/3h8bdN5M6dop4YLRifaxPMltW0OBi6tJZZxxMVY/F97HA73tH6Pl8zXFh3TBUwZ9Rtu6pezWUmayhrLcUyzEoWQC8ibLBsylwATCwy1acwOQjILMbn2yw4hxo4Sb1zovFb2PcP31qtjlty9qdBsBCDc/eDbAwpfi44= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724829918; c=relaxed/simple; bh=uNAUStP8cOxhUzMClYUCXeXWQ8Qh28/zpv6O4S4wzes=; h=From:To:CC:Subject:Date:Message-ID:References:In-Reply-To: Content-Type:MIME-Version; b=n6yxnX2g6KFtwzMylAhak9HDNYPtCimAV5R1rkXyKvWxeJa2hc4yc0XFr4NMRc/6MzxeD11PGZIx0Wau+2Nc8BveW9qGmH/EJS9PkT4SddWMbIf8YO4F704FTi6CoPc17v0O5z97Cv0MON8//+DzSRmM0/4MxQh9xBjSMxIhCEw= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=CDPhzU+I; arc=fail smtp.client-ip=198.175.65.17 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="CDPhzU+I" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1724829915; x=1756365915; h=from:to:cc:subject:date:message-id:references: in-reply-to:content-transfer-encoding:mime-version; bh=uNAUStP8cOxhUzMClYUCXeXWQ8Qh28/zpv6O4S4wzes=; b=CDPhzU+IYI75nEyDNK1JHCjTxNLeRnXdayIRMZFrCebTu3WKSzJnSPxv vebXBqQGRGtrzcg6W1QCAcMmIPAChb3XpZ1NDPbhqr0Upc0cwjvCIuZMb D28AIcYQ5jtHs3Qc5qdacPWWxptQeSyATI4Rl1rsAClBP5gvLBEB7S4+D 3vZNepoUg9fPYdidLDyBJJZVfQi6XYN/Gqfm0TIMWJJL1swE5AnU2Zc2R InVm/vLNFmumPXKWNkAHL7+fcfSOwICjCjISyjuoNHjD6EnkYAcXLlSIJ lzOMKZcxW9KXHZ8iZIKbRoD+jhF/zv1mzBbAKoNdxcLqWt/fdzXaPbdav Q==; X-CSE-ConnectionGUID: 1sPyDYDLRmukPvqWusa19A== X-CSE-MsgGUID: qAo6O0Q3QH2IGDKA7a308A== X-IronPort-AV: E=McAfee;i="6700,10204,11177"; a="23516797" X-IronPort-AV: E=Sophos;i="6.10,182,1719903600"; d="scan'208";a="23516797" Received: from orviesa002.jf.intel.com ([10.64.159.142]) by orvoesa109.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Aug 2024 00:24:22 -0700 X-CSE-ConnectionGUID: 4aB8wJ/hQV+slZAZ3STJcA== X-CSE-MsgGUID: NFBoUlYPQlqb2/4zNnWb1A== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,182,1719903600"; d="scan'208";a="93840726" Received: from orsmsx601.amr.corp.intel.com ([10.22.229.14]) by orviesa002.jf.intel.com with ESMTP/TLS/AES256-GCM-SHA384; 28 Aug 2024 00:24:22 -0700 Received: from orsmsx602.amr.corp.intel.com (10.22.229.15) by ORSMSX601.amr.corp.intel.com (10.22.229.14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Wed, 28 Aug 2024 00:24:21 -0700 Received: from ORSEDG602.ED.cps.intel.com (10.7.248.7) by orsmsx602.amr.corp.intel.com (10.22.229.15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39 via Frontend Transport; Wed, 28 Aug 2024 00:24:21 -0700 Received: from NAM11-CO1-obe.outbound.protection.outlook.com (104.47.56.174) by edgegateway.intel.com (134.134.137.103) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.39; Wed, 28 Aug 2024 00:24:21 -0700 ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=oiGzjakCdYmE/iQ18XuhfG5U3AB5kxqBdSq/44jJUlP8UoOtJLgY6aOpW2nmT+/s+F4zjhpnnXkhhq02PvVMaUzAQMmjDGWnvgHLcgfXayX4d1rCFDRT6D0/0ddpmZ5LlrXSbG92T3fPyV/TiMK02oA7rt/omyaKtmR4PS7s+okPw2t4iY9CXyf9mw/eCYcMQOlSav8WlNI2S9Sb7WPWX6Sd8MUssQpyDvGEwRSM0TnIBHBIDEj7rmRNNyLG5B/mTA6HfyBUol3xzNwXtyIkDoACxq9TeCWOd0J4HlrUJkeNkUDRxk/N+6mQFQ0+NutBIRjSqgG4RskMRQ4tmL9n2A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=uNAUStP8cOxhUzMClYUCXeXWQ8Qh28/zpv6O4S4wzes=; b=xcCEQQy3Z7oBbANmKIz2iEbg/9INdTuvuoVQPulAf792UpKAYeqPb7vFGyXxkpxsh7fauwwCLpjM8kfyRkRKR3D0iCNRpOcFoy1xMXp42OK/UgsYnzDrU/GVhCRAma1vQxrOptx6y/rahMXQa+WXS/EW58yeD1kFlaC7KWa8SLvAm9yQyLGEYKE4/2H0bracFOHovhdqUZ6wEqDas9FJ80dfBPWGYSl6iRaN9beya7VXlodzSb1DH/WijnAdH60Dhc83pGAHK5h1WZ6Yyzt71YLLYWwkGozwT7CBaln5iKSVl1yWVlZly9gxMECYHiC38yNdQ8gVtWBF0zi2nWjUyA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=intel.com; dmarc=pass action=none header.from=intel.com; dkim=pass header.d=intel.com; arc=none Received: from SJ0PR11MB5678.namprd11.prod.outlook.com (2603:10b6:a03:3b8::22) by CY8PR11MB7172.namprd11.prod.outlook.com (2603:10b6:930:93::10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7897.25; Wed, 28 Aug 2024 07:24:19 +0000 Received: from SJ0PR11MB5678.namprd11.prod.outlook.com ([fe80::812:6f53:13d:609c]) by SJ0PR11MB5678.namprd11.prod.outlook.com ([fe80::812:6f53:13d:609c%5]) with mapi id 15.20.7875.019; Wed, 28 Aug 2024 07:24:18 +0000 From: "Sridhar, Kanchana P" To: Nhat Pham CC: "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" , "hannes@cmpxchg.org" , "yosryahmed@google.com" , "ryan.roberts@arm.com" , "Huang, Ying" , "21cnbao@gmail.com" <21cnbao@gmail.com>, "akpm@linux-foundation.org" , "Zou, Nanhai" , "Feghali, Wajdi K" , "Gopal, Vinodh" , "Sridhar, Kanchana P" Subject: RE: [PATCH v4 0/4] mm: ZSWAP swap-out of mTHP folios Thread-Topic: [PATCH v4 0/4] mm: ZSWAP swap-out of mTHP folios Thread-Index: AQHa8d3OD+uOh/WK5kyI6A6kVnXGhrIxzReAgABJIZCAA+ChsIADqWIAgABIr/CAAV2VgIAALnjAgADdOKA= Date: Wed, 28 Aug 2024 07:24:18 +0000 Message-ID: References: <20240819021621.29125-1-kanchana.p.sridhar@intel.com> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=intel.com; x-ms-publictraffictype: Email x-ms-traffictypediagnostic: SJ0PR11MB5678:EE_|CY8PR11MB7172:EE_ x-ms-office365-filtering-correlation-id: b6bd640f-bb65-4253-c029-08dcc7326dd1 x-ms-exchange-senderadcheck: 1 x-ms-exchange-antispam-relay: 0 x-microsoft-antispam: BCL:0;ARA:13230040|1800799024|366016|376014|38070700018; x-microsoft-antispam-message-info: =?utf-8?B?QXozTjBYY3FITlhSRDJjR2psMXRJTGpOOHB3a1Y5eFZQNGpmcTkyR3M4L0lB?= =?utf-8?B?dWt4T25UM3dKL1Y1SEdWRlBqVVBsSis4UVJ3aVVGNC80OWlpQ3VBcmxpcmls?= =?utf-8?B?dnVaYzZ6VzMrNWExUFI5aWV0aXoyYUMxZXY5WkpTSVdFVmNQMUdscy90SEE2?= =?utf-8?B?eUNEY2libjFCNjJhaWQyNnFZYlJFdnRobnQ4Wk9JaGt6TU9WNU9JaCsvUFZV?= =?utf-8?B?aWlRaE9aMWV2bStIR0paN3M4Q2JtU3V2Y3VqczBUSkpRaVRiVVRlQVhld0Fy?= =?utf-8?B?U1RaaHdOZm9BQ1pJclNBOGYrbHNsaGlQb00zN3RaQ3dpQ1JOVmtjSmhJRXZy?= =?utf-8?B?VzFBSnpoajVSMjhoK0o2aGdtR2k0NTBJYTcyWHlJUnRMc0ZGYUgrTjNBRzRM?= =?utf-8?B?ZVRwMkVQS2JCbWNqM0FwSHFreEJkTkJjbXIydkowTENHbW9WcllmOFFjYlN3?= =?utf-8?B?MHFwRTFEblRrY3pERFBEQ2JvNWJ5bmJqWGk3UWFhRi9ERHA2SVQ5ejc5ei8v?= =?utf-8?B?T0lEbFhrZTNYMzQveWxqKzdRRm16cnFsSG0yVG50NlUwc3N6SjF1cWFvd2Vy?= =?utf-8?B?enN2SkI2SXN2LzZZNU16Q0FlY3BMRWt2bU0rb3B2SDh0R3BudFRoNjdhQ1lW?= =?utf-8?B?cmpHTEFocnNXaDZkZW9oV0gwb2xvY2pNdHJxMWFSbEJ2WUJYTXpFM2l0UXcz?= =?utf-8?B?SkEvQnR5T1JDQXQxMms1UUJwREFsTzV0ajVTOTJYZmpNUlVkRkh0QkFHaDB6?= =?utf-8?B?TXRGdndPMW5xeE55NGV1R3lZaS9hanpNU2pMdlFJVWRMbjRvZGgxN1FYNVRO?= =?utf-8?B?QldGRDNKUldTNVcrZmZlQ2RXemwwcEpYN1lZNk85a2hoaC82ZGdNNXRrbzhr?= =?utf-8?B?RGxuMGJmampFMlpYZGJCVmpJbnNXYy9HN3UwMGNoa1FSSEVRT2JLU0FGYjZI?= =?utf-8?B?cUtCNExLdjlzVGpIRUhFcUxhTEhtOWozVm1UWGRDNlJNRHZFRmkyUHdpOU55?= =?utf-8?B?ZXBWRFlaTENTNHE2MlYyL0UxWkl2VmNqRm1wdFkzVzA1aXdkZVlGVCtTVTdh?= =?utf-8?B?VHlPdDZZWnMyekVuUE44YUNMYjN6Z1JEOHpzclUvREdBUUJEbUpIRlpBNmhV?= =?utf-8?B?RTQ1QkZNQ3kzalVsVCtXL3p6bldTUlRZZk51aWJ4NkUwVGR4R09oN1IvVzdQ?= =?utf-8?B?ZFljUFJBd2lYVTZNT2R3bCtvWjE5dHlINElqRWRDZ1JKL29lWFpSaGVWcTRD?= =?utf-8?B?eU5qODlqRlVqbTF2cDBvT3hGdWVYeGRTNWhRdmdLSHJvNDF2L0V0QUR0Q0s1?= =?utf-8?B?bEpzREVVK1pDeHFNeEZ2VWhvOEtwaTRxK3plZzNRT1lOQVk0RzdIYklzRDBC?= =?utf-8?B?SytIQjlJdmkxd3pJZVlMNUpzWlZGKys2ZWFybUMvQ21taGhONVd6MldyUjRP?= =?utf-8?B?WlpPekhERENsYmRxSG5oTERYTFEvSFJPQzVKMnpFWW9QS25scmtOTW1mOVFu?= =?utf-8?B?ZTFPQ1B3QnpsUEU2eTlnSzU0ZlUvK0duU2pOQWxERlpTSWhkbnBBMUhmd3BV?= =?utf-8?B?NXFLWmxpVHQxU3RndGpObWpHWkMxOVNaT2t0MUplNkp5UDRudmxodThyNUEx?= =?utf-8?B?NjFUeGhFdHlYU2xZRjNTUlJBMGY0cUh5ZEdDSVlCWmxJcjRubkNsb1E5L0Ux?= =?utf-8?B?QnZ0Z25lMEsxQTNkOFp4bjJnUFozM1pLS0R5SFpGQWdmMkVhTWhHd0NVMDhF?= =?utf-8?B?NGtJRUxINjZQYjU2T2hnSk5aN3VkeFEvRWgrbjFxMWJhM3hTRkxEQzNIeno5?= =?utf-8?B?WXhoLzZudmlVMjV2clNJOCt2TkMwejRCb05YNndGL2xlMzlUSkJaQlVhMWlT?= =?utf-8?B?ZUdIOVZlc2owUHNIaU9Ddm8yWWlaMUg0TjJoeU5qL1lmU1E9PQ==?= x-forefront-antispam-report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:SJ0PR11MB5678.namprd11.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230040)(1800799024)(366016)(376014)(38070700018);DIR:OUT;SFP:1101; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?utf-8?B?QzlVSHJScElMWEJoTk9WTmRHRTRHaitjYUt5bFVvOGFVNHlteThyQzl5bGpw?= =?utf-8?B?bXNkZnN4empPMVFqVUZXbzNWWWFXM2gxbFpDaGFvYVVlZzhmMENUMURTRzAy?= =?utf-8?B?bEJHZ0tvQ202OVQyNndBRmZkN2ZRYW51Zzh2SnoyMDdHVUVHY0IwYjhET08x?= =?utf-8?B?dzZCQk5ZZ25aU0xnUDhsTkgxTWFFaUlPVTBVUTJoaEtGcUQ3dTZLRis3bXdN?= =?utf-8?B?ZTFyWitYZzBpT0V2UldJZE9iNHU4QVZ3YkJ5WUtmbGQvdHBkSmR1Z1dLcndj?= =?utf-8?B?ZllmWWhuNnAySHorK0dDOEx6UmVGYVdBeVFDMGZuSDZpeXF4YTZiRWlpTTdu?= =?utf-8?B?UWhwakFFWWRyaCtncW5ScExpMlpUdE5yYjNOd25qNHFoRys1SWdLNlozQWdL?= =?utf-8?B?bmtnT1N4UkVIVk81d2ZFVmN6cldBTVkwc0lBeXMrK1BpOU9hY3Z4MTdjUEdl?= =?utf-8?B?SFVFMWNPWnZPdzZMVXpPOWpFMnpkNEpqN2ZHTUxDYUxRbnJBbVljejg1R2dB?= =?utf-8?B?SkgvUUZYcjgwTWdrNWt1MnYvaElLajJMTS9jaWp6eDArRlZqY010UVFOczQr?= =?utf-8?B?czFtVklzTkYva3hoSzc3eGJMZy8rYThNTTBSUTJGdi9yZEo2RWxTL1RYdkNH?= =?utf-8?B?UkNMQW9RL1kzeGxyZFRIcVYwaWZqR1ZFWG5pKzVFckZTVzc4WjM5ek1OS2cw?= =?utf-8?B?U2x4R2tsK2N3Z1JRN1JZNUhaYnBzU3NDS0M4MlhrMDN1SFhYTE5rWTFPcGdn?= =?utf-8?B?UXF6SllhbktGWDhGRXd3Zk1NSWorYUNEYlFWK3lyQkEvWGF2aGU2Z3BHUXJE?= =?utf-8?B?cTJqb2RTZTk3OVh5d21Ec0FoRTlMQnBLYXNEUDJ2Y1VYV3lGck5OTVo5cTcv?= =?utf-8?B?WHFHdHFGZzZrL21CM2RGdHNvQWdVRUk3aTFja0FZK0VPSlFOeHhoSG56Sm5y?= =?utf-8?B?MEV6d2VvMUEvcHh2ZGE4b2tsbUR6bTdYYnROUE5NU2pkaE5kaEJKSUJDaEJm?= =?utf-8?B?eU1uNjdJRFdrQWk4WFA5MmxPek9OTnN1Zi8vQzExNzlsdkVOTFNZREtkVU5W?= =?utf-8?B?ZTA2dnMyUW04cGkySUVXUFAzUTMvSzFsWlN6cEF0MXl5YVJUYXpnOHZTeUlz?= =?utf-8?B?NzhvKyt3cVAyWkFha2dEcWRPdExrRk9Nb2RsVDJUb2QvTWdIZjdkUks0SVpE?= =?utf-8?B?WWxRbFYxa0ZIZWx4bmFXTkFoQ1ZZNzExTFk0Z0hvM1VOOUtDNkt6NURveTVT?= =?utf-8?B?ZGh1enNVSHdjYzVlcTRyZ0hIRmdNbGU0cEEwblpWWmVQbjgydHZFWkUzNXlJ?= =?utf-8?B?YjRWQjYrK3R1WWJrNVczdFc0TzliTjcrQm1lNzNBalpiYmhnbFowa0M2eWJI?= =?utf-8?B?K2F3STdoVEY2S1JMcjlkYlZnVjdBMHZQZmdWYmJQaGxSbXBZMzJBcC95TlNZ?= =?utf-8?B?eUxoUjRkQ203bTR0Qnpxd1g1Z2dRend4Q1Q5a3pVUmFzalZHVHdRU1k2NzhX?= =?utf-8?B?cjNYdXZnRHA0dHhUd2lveXdZUkNFVmRicFgvVWVSbEhKMUFZU2hTTzR4RlpB?= =?utf-8?B?ek9YMGtUUDJkOXp3a2xiN05EWHJocmtRN0xwRWxLMktwbWRMN0tkdjFyVS96?= =?utf-8?B?QlhWb3JIaElvTVc2Vkc2Q0dTbE1maEhUNXJuOEhKVExMSWxWRW9uR2U5UjBV?= =?utf-8?B?NkNlZUpKVEhnMmhqQW5tV1BHc1JYa0Z3cEtYNVY2WnVneWRxUGoyNXZad1dD?= =?utf-8?B?MEtPTVFFSE1CUWQ5aTdFY2N5bVZhWmliRnc1bTJqdSs4c2JUSVF1SGV4ZjlZ?= =?utf-8?B?a1NmQUJMODRsSkJxSzZwNHR2VjZUaXZWMEdLVTRtRDArS01ZUVA1T1RiTWMw?= =?utf-8?B?SXdIc0RYVTY2UDh2bU12N2VkMCtqZmNtYVQ3dGgwOVVFZXkrMzEzUkFoVHk4?= =?utf-8?B?OCtuVGlJZVJLd0ZKZ1M2WFgvT01KdW5pUDZBZWw4a1JoVFZzdnlhYlEyaXpn?= =?utf-8?B?MlZhdTdJMytGSnFOakFsQk1tbjV5dkRqaFo4N290WWJYbUVhQ2Y1UDdjbi9G?= =?utf-8?B?aWNsYXZRVEZIcUdJYk1zSm83cEJRbW9iUjlnTWRsdFN2NnhEbnFpR29pQStV?= =?utf-8?B?N0Z3SjhTcDhSWWpEQWs4OFVLcGd5V1JaRTJLOHVhNVpMcE44UzdwbTR6VWFw?= =?utf-8?B?QXc9PQ==?= Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: SJ0PR11MB5678.namprd11.prod.outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: b6bd640f-bb65-4253-c029-08dcc7326dd1 X-MS-Exchange-CrossTenant-originalarrivaltime: 28 Aug 2024 07:24:18.3987 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 46c98d88-e344-4ed4-8496-4ed7712e255d X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: 18OXhxBxzOQjZc1b8PochMBVS7xYkaBi7WrIcMlk868oDWGysyFUUZ6BQmuVKuU0sLjiaAMYk2YtzP5N/WZoZ6VkWEM3idxx4F2654xKDfo= X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY8PR11MB7172 X-OriginatorOrg: intel.com > -----Original Message----- > From: Sridhar, Kanchana P > Sent: Tuesday, August 27, 2024 11:42 AM > To: Nhat Pham > Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org; > hannes@cmpxchg.org; yosryahmed@google.com; ryan.roberts@arm.com; > Huang, Ying ; 21cnbao@gmail.com; akpm@linux- > foundation.org; Zou, Nanhai ; Feghali, Wajdi K > ; Gopal, Vinodh ; > Sridhar, Kanchana P > Subject: RE: [PATCH v4 0/4] mm: ZSWAP swap-out of mTHP folios >=20 >=20 > > -----Original Message----- > > From: Nhat Pham > > Sent: Tuesday, August 27, 2024 8:24 AM > > To: Sridhar, Kanchana P > > Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org; > > hannes@cmpxchg.org; yosryahmed@google.com; ryan.roberts@arm.com; > > Huang, Ying ; 21cnbao@gmail.com; akpm@linux- > > foundation.org; Zou, Nanhai ; Feghali, Wajdi K > > ; Gopal, Vinodh > > Subject: Re: [PATCH v4 0/4] mm: ZSWAP swap-out of mTHP folios > > > > On Mon, Aug 26, 2024 at 11:08=E2=80=AFPM Sridhar, Kanchana P > > wrote: > > > > > > > Internally, we often see 1-3 or 1-4 saving ratio (or even more). > > > > > > Agree with this as well. In our experiments with other workloads, we > > > typically see much higher ratios. > > > > > > > > > > > Probably does not explain everything, but worth double checking - > > > > could you check with zstd to see if the ratio improves. > > > > > > Sure. I gathered ratio and compressed memory footprint data today with > > > 64K mTHP, the 4G SSD swapfile and different zswap compressors. > > > > > > This patch-series and no zswap charging, 64K mTHP: > > > ---------------------------------------------------------------------= ------ > > > Total Total Average Average = Comp > > > compressed compression compressed compression = ratio > > > length latency length latency > > > bytes milliseconds bytes nanoseconds > > > ---------------------------------------------------------------------= ------ > > > SSD (no zswap) 1,362,296,832 887,861 > > > lz4 2,610,657,430 55,984 2,055 44,065 = 1.99 > > > zstd 729,129,528 50,986 565 39,510 = 7.25 > > > deflate-iaa 1,286,533,438 44,785 1,415 49,252 = 2.89 > > > ---------------------------------------------------------------------= ------ > > > > > > zstd does very well on ratio, as expected. > > > > Wait. So zstd is displaying 7-to-1 compression ratio? And has *lower* > > average latency? > > > > Why are we running benchmark on lz4 again? Sure there is no free lunch > > and no compressor that works well on all kind of data, but lz4's > > performance here is so bad that it's borderline justifiable to > > disable/bypass zswap with this kind of compresison ratio... > > > > Can I ask you to run benchmarking on zstd from now on? >=20 > Sure, will do. >=20 > > > > > > > > > > > > > > > > > > > > > > > > Experiment 3 - 4K folios swap characteristics SSD vs. ZSWAP: > > > > > ------------------------------------------------------------ > > > > > > > > > > I wanted to take a step back and understand how the mainline v6.= 11- > > rc3 > > > > > handles 4K folios when swapped out to SSD (CONFIG_ZSWAP is off) > and > > > > when > > > > > swapped out to ZSWAP. Interestingly, higher swapout activity is > > observed > > > > > with 4K folios and v6.11-rc3 (with the debug change to not charge > > zswap to > > > > > cgroup). > > > > > > > > > > v6.11-rc3 with no zswap charge, only 4K folios, no (m)THP: > > > > > > > > > > ------------------------------------------------------------- > > > > > SSD (CONFIG_ZSWAP is OFF) ZSWAP lz4 lzo-rle > > > > > ------------------------------------------------------------- > > > > > cgroup memory.events: cgroup memory.events: > > > > > > > > > > low 0 low 0 0 > > > > > high 5,068 high 321,923 375,116 > > > > > max 0 max 0 0 > > > > > oom 0 oom 0 0 > > > > > oom_kill 0 oom_kill 0 0 > > > > > oom_group_kill 0 oom_group_kill 0 0 > > > > > ------------------------------------------------------------- > > > > > > > > > > SSD (CONFIG_ZSWAP is OFF): > > > > > -------------------------- > > > > > pswpout 415,709 > > > > > sys time (sec) 301.02 > > > > > Throughput KB/s 155,970 > > > > > memcg_high events 5,068 > > > > > -------------------------- > > > > > > > > > > > > > > > ZSWAP lz4 lz4 lz4 lzo-rle > > > > > -------------------------------------------------------------- > > > > > zswpout 1,598,550 1,515,151 1,449,432 1,493,917 > > > > > sys time (sec) 889.36 481.21 581.22 635.75 > > > > > Throughput KB/s 35,176 14,765 20,253 21,407 > > > > > memcg_high events 321,923 412,733 369,976 375,116 > > > > > -------------------------------------------------------------- > > > > > > > > > > This shows that there is a performance regression of -60% to -19= 5% > > with > > > > > zswap as compared to SSD with 4K folios. The higher swapout acti= vity > > with > > > > > zswap is seen here too (i.e., this doesn't appear to be mTHP-spe= cific). > > > > > > > > > > I verified this to be the case even with the v6.7 kernel, which = also > > > > > showed a 2.3X throughput improvement when we don't charge > zswap: > > > > > > > > > > ZSWAP lz4 v6.7 v6.7 with no cgroup zswap ch= arge > > > > > ----------------------------------------------------------------= ---- > > > > > zswpout 1,419,802 1,398,620 > > > > > sys time (sec) 535.4 613.41 > > > > > > > > systime increases without zswap cgroup charging? That's strange... > > > > > > Additional data gathered with v6.11-rc3 (listed below) based on your > > suggestion > > > to investigate potential swap.high breaches should hopefully provide > some > > > explanation. > > > > > > > > > > > > Throughput KB/s 8,671 20,045 > > > > > memcg_high events 574,046 451,859 > > > > > > > > So, on 4k folio setup, even without cgroup charge, we are still see= ing: > > > > > > > > 1. More zswpout (than observed in SSD) > > > > 2. 40-50% worse latency - in fact it is worse without zswap cgroup > > charging. > > > > 3. 100 times the amount of memcg_high events? This is perhaps the > > > > *strangest* to me. You're already removing zswap cgroup charging, > then > > > > where does this comes from? How can we have memory.high violation > > when > > > > zswap does *not* contribute to memory usage? > > > > > > > > Is this due to swap limit charging? Do you have a cgroup swap limit? > > > > > > > > mem_high =3D page_counter_read(&memcg->memory) > > > > > READ_ONCE(memcg->memory.high); > > > > swap_high =3D page_counter_read(&memcg->swap) > > > > > READ_ONCE(memcg->swap.high); > > > > [...] > > > > > > > > if (mem_high || swap_high) { > > > > /* > > > > * The allocating tasks in this cgroup will need to do > > > > * reclaim or be throttled to prevent further growth > > > > * of the memory or swap footprints. > > > > * > > > > * Target some best-effort fairness between the tasks, > > > > * and distribute reclaim work and delay penalties > > > > * based on how much each task is actually allocating. > > > > */ > > > > current->memcg_nr_pages_over_high +=3D batch; > > > > set_notify_resume(current); > > > > break; > > > > } > > > > > > > > > > I don't have a swap.high limit set on the cgroup; it is set to "max". > > > > > > I ran experiments with v6.11-rc3, no zswap charging, 4K folios and > different > > > zswap compressors to verify if swap.high is breached with the 4G SSD > > swapfile. > > > > > > SSD (CONFIG_ZSWAP is OFF): > > > > > > SSD SSD SSD > > > ------------------------------------------------------------ > > > pswpout 415,709 1,032,170 636,582 > > > sys time (sec) 301.02 328.15 306.98 > > > Throughput KB/s 155,970 89,621 122,219 > > > memcg_high events 5,068 15,072 8,344 > > > memcg_swap_high events 0 0 0 > > > memcg_swap_fail events 0 0 0 > > > ------------------------------------------------------------ > > > > > > ZSWAP zstd zstd zstd > > > ---------------------------------------------------------------- > > > zswpout 1,391,524 1,382,965 1,417,307 > > > sys time (sec) 474.68 568.24 489.80 > > > Throughput KB/s 26,099 23,404 111,115 > > > memcg_high events 335,112 340,335 162,260 > > > memcg_swap_high events 0 0 0 > > > memcg_swap_fail events 1,226,899 5,742,153 > > > (mem_cgroup_try_charge_swap) > > > memcg_memory_stat_pgactivate 1,259,547 > > > (shrink_folio_list) > > > ---------------------------------------------------------------- > > > > > > ZSWAP lzo-rle lzo-rle lzo-rle > > > ----------------------------------------------------------- > > > zswpout 1,493,917 1,363,040 1,428,133 > > > sys time (sec) 635.75 498.63 484.65 > > > Throughput KB/s 21,407 23,827 20,237 > > > memcg_high events 375,116 352,814 373,667 > > > memcg_swap_high events 0 0 0 > > > memcg_swap_fail events 715,211 > > > ----------------------------------------------------------- > > > > > > ZSWAP lz4 lz4 lz4 lz4 > > > --------------------------------------------------------------------- > > > zswpout 1,378,781 1,598,550 1,515,151 1,449,432 > > > sys time (sec) 495.45 889.36 481.21 581.22 > > > Throughput KB/s 26,248 35,176 14,765 20,253 > > > memcg_high events 347,209 321,923 412,733 369,976 > > > memcg_swap_high events 0 0 0 0 > > > memcg_swap_fail events 580,103 0 > > > --------------------------------------------------------------------- > > > > > > ZSWAP deflate-iaa deflate-iaa deflate-iaa > > > ---------------------------------------------------------------- > > > zswpout 380,471 1,440,902 1,397,965 > > > sys time (sec) 329.06 570.77 467.41 > > > Throughput KB/s 283,867 28,403 190,600 > > > memcg_high events 5,551 422,831 28,154 > > > memcg_swap_high events 0 0 0 > > > memcg_swap_fail events 0 2,686,758 438,562 > > > ---------------------------------------------------------------- > > > > Why are there 3 columns for each of the compressors? Is this different > > runs of the same workload? > > > > And why do some columns have missing cells? >=20 > Yes, these are different runs of the same workload. Since there is some > amount of variance seen in the data, I figured it is best to publish the > metrics from the individual runs rather than averaging. >=20 > Some of these runs were gathered earlier with the same code base, > however, I wasn't monitoring/logging the > memcg_swap_high/memcg_swap_fail > events at that time. For those runs, just these two counters have missing > column entries; the rest of the data is still valid. >=20 > > > > > > > > There are no swap.high memcg events recorded in any of the SSD/zswap > > > experiments. However, I do see significant number of memcg_swap_fail > > > events in some of the zswap runs, for all 3 compressors. This is not > > > consistent, because there are some runs with 0 memcg_swap_fail for a= ll > > > compressors. > > > > > > There is a possible co-relation between memcg_swap_fail events > > > (/sys/fs/cgroup/test/memory.swap.events) and the high # of > memcg_high > > > events. The root-cause appears to be that there are no available swap > > > slots, memcg_swap_fail is incremented, add_to_swap() fails in > > > shrink_folio_list(), followed by "activate_locked:" for the folio. > > > The folio re-activation is recorded in cgroup memory.stat pgactivate > > > events. The failure to swap out folios due to lack of swap slots cou= ld > > > contribute towards memory.high breaches. > > > > Yeah FWIW, that was gonna be my first suggestion. This swapfile size > > is wayyyy too small... > > > > But that said, the link is not clear to me at all. The only thing I > > can think of is lz4's performance sucks so bad that it's not saving > > enough memory, leading to regression. And since it's still taking up > > swap slot, we cannot use swap either? >=20 > The occurrence of memcg_swap_fail events establishes that swap slots > are not available with 4G of swap space. This causes those 4K folios to > remain in memory, which can worsen an existing problem with memory.high > breaches. >=20 > However, it is worth noting that this is not the only contributor to > memcg_high events that still occur without zswap charging. The data shows > 321,923 occurrences of memcg_high in Col 2 of the lz4 table, that also has > 0 occurrences of memcg_swap_fail reported in the cgroup stats. >=20 > > > > > > > > However, this is probably not the only cause for either the high # of > > > memory.high breaches or the over-reclaim with zswap, as seen in the = lz4 > > > data where the memory.high is significant even in cases where there = are > no > > > memcg_swap_fails. > > > > > > Some observations/questions based on the above 4K folios swapout data: > > > > > > 1) There are more memcg_high events as the swapout latency reduces > > > (i.e. faster swap-write path). This is even without charging zswap > > > utilization to the cgroup. > > > > This is still inexplicable to me. If we are not charging zswap usage, > > we shouldn't even be triggering the reclaim_high() path, no? > > > > I'm curious - can you use bpftrace to tracks where/when reclaim_high > > is being called? Hi Nhat, Since reclaim_high() is called only in a handful of places, I figured I would just use debugfs u64 counters to record where it gets called from. These are the places where I increment the debugfs counters: include/linux/resume_user_mode.h: --------------------------------- diff --git a/include/linux/resume_user_mode.h b/include/linux/resume_user_m= ode.h index e0135e0adae0..382f5469e9a2 100644 --- a/include/linux/resume_user_mode.h +++ b/include/linux/resume_user_mode.h @@ -24,6 +24,7 @@ static inline void set_notify_resume(struct task_struct *= task) kick_process(task); } =20 +extern u64 hoh_userland; =20 /** * resume_user_mode_work - Perform work before returning to user mode @@ -56,6 +57,7 @@ static inline void resume_user_mode_work(struct pt_regs *= regs) } #endif =20 + ++hoh_userland; mem_cgroup_handle_over_high(GFP_KERNEL); blkcg_maybe_throttle_current(); =20 mm/memcontrol.c: ---------------- diff --git a/mm/memcontrol.c b/mm/memcontrol.c index f29157288b7d..6738bb670a78 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1910,9 +1910,12 @@ static unsigned long reclaim_high(struct mem_cgroup = *memcg, return nr_reclaimed; } =20 +extern u64 rec_high_hwf; + static void high_work_func(struct work_struct *work) { struct mem_cgroup *memcg; + ++rec_high_hwf; =20 memcg =3D container_of(work, struct mem_cgroup, high_work); reclaim_high(memcg, MEMCG_CHARGE_BATCH, GFP_KERNEL); @@ -2055,6 +2058,8 @@ static unsigned long calculate_high_delay(struct mem_= cgroup *memcg, return penalty_jiffies * nr_pages / MEMCG_CHARGE_BATCH; } =20 +extern u64 rec_high_hoh; + /* * Reclaims memory over the high limit. Called directly from * try_charge() (context permitting), as well as from the userland @@ -2097,6 +2102,7 @@ void mem_cgroup_handle_over_high(gfp_t gfp_mask) * memory.high is currently batched, whereas memory.max and the page * allocator run every time an allocation is made. */ + ++rec_high_hoh; nr_reclaimed =3D reclaim_high(memcg, in_retry ? SWAP_CLUSTER_MAX : nr_pages, gfp_mask); @@ -2153,6 +2159,8 @@ void mem_cgroup_handle_over_high(gfp_t gfp_mask) css_put(&memcg->css); } =20 +extern u64 hoh_trycharge; + int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask, unsigned int nr_pages) { @@ -2344,8 +2352,10 @@ int try_charge_memcg(struct mem_cgroup *memcg, gfp_t= gfp_mask, */ if (current->memcg_nr_pages_over_high > MEMCG_CHARGE_BATCH && !(current->flags & PF_MEMALLOC) && - gfpflags_allow_blocking(gfp_mask)) + gfpflags_allow_blocking(gfp_mask)) { + ++hoh_trycharge; mem_cgroup_handle_over_high(gfp_mask); + } return 0; } =20 I reverted my debug changes for "zswap to not charge cgroup" when I ran these next set of experiments that record the # of times and locations where reclaim_high() is called. zstd is the compressor I have configured for both ZSWAP and ZRAM. 6.11-rc3 mainline, 176Gi ZRAM backing for ZSWAP, zstd, 64K mTHP: ---------------------------------------------------------------- /sys/fs/cgroup/iax/memory.events: high 112,910 hoh_userland 128,835 hoh_trycharge 0 rec_high_hoh 113,079 rec_high_hwf 0 6.11-rc3 mainline, 4G SSD backing for ZSWAP, zstd, 64K mTHP: ------------------------------------------------------------ /sys/fs/cgroup/iax/memory.events: high 4,693 =20 hoh_userland 14,069 hoh_trycharge 0 rec_high_hoh 4,694 rec_high_hwf 0 ZSWAP-mTHP, 176Gi ZRAM backing for ZSWAP, zstd, 64K mTHP: --------------------------------------------------------- /sys/fs/cgroup/iax/memory.events: high 139,495 =20 hoh_userland 156,628 hoh_trycharge 0 rec_high_hoh 140,039 rec_high_hwf 0 ZSWAP-mTHP, 4G SSD backing for ZSWAP, zstd, 64K mTHP: ----------------------------------------------------- /sys/fs/cgroup/iax/memory.events: high 20,427 =20 /sys/fs/cgroup/iax/memory.swap.events: fail 20,856 =20 hoh_userland 31,346 hoh_trycharge 0 rec_high_hoh 20,513 rec_high_hwf 0 This shows that in all cases, reclaim_high() is called only from the return path to user mode after handling a page-fault. Thanks, Kanchana >=20 > I had confirmed earlier with counters that all calls to reclaim_high() > were from include/linux/resume_user_mode.h::resume_user_mode_work(). > I will confirm this with zstd and bpftrace and share. >=20 > Thanks, > Kanchana >=20 > > > > > > > > 2) There appears to be a direct co-relation between higher # of > > > memcg_swap_fail events, and an increase in memcg_high breaches and > > > reduction in usemem throughput. This combined with the observation= in > > > (1) suggests that with a faster compressor, we need more swap slot= s, > > > that increases the probability of running out of swap slots with t= he 4G > > > SSD backing device. > > > > > > 3) Could the data shared earlier on reduction in memcg_high breaches > with > > > 64K mTHP swapout provide some more clues, if we agree with (1) and > (2): > > > > > > "Interestingly, the # of memcg_high events reduces significantly w= ith > 64K > > > mTHP as compared to the above 4K memcg_high events data, when > > tested > > > with v4 and no zswap charge: 3,069 (SSD-mTHP) and 19,656 (ZSWAP- > > mTHP)." > > > > > > 4) In the case of each zswap compressor, there are some runs that go > > > through with 0 memcg_swap_fail events. These runs generally have > better > > > fewer memcg_high breaches and better sys time/throughput. > > > > > > 5) For a given swap setup, there is some amount of variance in > > > sys time for this workload. > > > > > > 6) All this suggests that the primary root cause is the concurrency s= etup, > > > where there could be randomness between runs as to the # of proces= ses > > > that observe the memory.high breach due to other factors such as > > > availability of swap slots for alloc. > > > > > > To summarize, I believe the root-cause is the 4G SSD swapfile resulti= ng in > > > running out of swap slots, and anomalous behavior with over-reclaim > when > > 70 > > > concurrent processes are working with the 60G memory limit while tryi= ng > to > > > allocate 1G each; with randomness in processes reacting to the breach. > > > > > > The cgroup zswap charging exacerbates this situation, but is not a pr= oblem > > > in and of itself. > > > > > > Nhat, as you pointed out, this is somewhat of an unrealistic scenario= that > > > doesn't seem to indicate any specific problems to be solved, other th= an > the > > > temporary cgroup zswap double-charging. > > > > > > Would it be fair to evaluate this patch-series based on a more realis= tic > > > swapfile configuration based on 176G ZRAM, for which I had shared the > > data > > > in v2? There weren't any problems with swap slots availability or any > > > anomalies that I can think of with this setup, other than the fact th= at the > > > "Before" and "After" sys times could not be directly compared for 2 k= ey > > > reasons: > > > > > > - ZRAM compressed data is not charged to the cgroup, similar to SSD. > > > - ZSWAP compressed data is charged to the cgroup. > > > > Yeah that's a bit unfair still. Wild idea, but what about we compare > > SSD without zswap (or SSD with zswap, but without this patch series so > > that mTHP are not zswapped) v.s zswap-on-zram (i.e with a backing > > swapfile on zram block device). > > > > It is stupid, I know. But let's take advantage of the fact that zram > > is not charged to cgroup, pretending that its memory foot print is > > empty? > > > > I don't know how zram works though, so my apologies if it's a stupid > > suggestion :)