From nobody Tue Dec 16 08:58:03 2025 Received: from mail-pf1-f178.google.com (mail-pf1-f178.google.com [209.85.210.178]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 17D86328B54 for ; Mon, 15 Dec 2025 09:06:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.178 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765789586; cv=none; b=VfxegwWKsqrkuTzhbah6uEcNrWM4bQe9r5hqGKjlr+d0MReaRn5iuqi+KFWpdL0Rm55+4MPoUmYMw5/IGc/inhHd2HTfQKNt2KewhNWFMVZShll1lXeqnIoghWv8OM00ZrQAVNZW/XjqcUcSrIhu7GlwbwnQ/DGWIzPm9ItNN0Y= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765789586; c=relaxed/simple; bh=xrceK18srUwqt3PRAltsvJjOGf/R5Oj3ojvTGve5QHA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=FutovpqUTIeVeZBHpQjhMblUDeXXEuvSR9AW8Z2FHiqXHt36sZk8R4tvSqar3pHJGNFbgTNBQeAA8allC3bfk45JH/v8Su/PXkufwdpSVv5n0DHgcoWIzQ9wIarPXHIRPGcIl3VL7HaV7PFCfQMOLomr4w08kPRVK63qrAZVxaU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=U5a5JNYb; arc=none smtp.client-ip=209.85.210.178 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="U5a5JNYb" Received: by mail-pf1-f178.google.com with SMTP id d2e1a72fcca58-7b80fed1505so3092396b3a.3 for ; Mon, 15 Dec 2025 01:06:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1765789584; x=1766394384; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=NmcIU9qlUpZvwHLYQ6LJ9BQcVZiKr95F6LsU889+bmM=; b=U5a5JNYbvsmj0iytNrbKOYYiOJ9P246ZyoqpebB2MUfuI+vC+rRc46hJKTfKulVHVE fV+lQ6aewpp0e0muS9yYU+xzMWZyWdQLdo6TkvpUEZ95kYV0DDIJYX/ez4at7mkkDqBo ZdlzvKsrJ8TGOehZA7lZ+WpiGBqqjNa7o5NusUB0oF1iXfTKtbieE1JlhCpIkl3wgTlX 3neRpad3Mx15n4BtkX87nK5CDIl0sxt4cRquSS96cAYbGOxEHXsk/wdKSSm83GDvBIsJ RXPQYV5q8kdSZCs6XfkFf/5QDVzsmYBYudLMtJ67a9hB+ZgxkpkYyeqXbz/E8lEMT6RT PySQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1765789584; x=1766394384; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=NmcIU9qlUpZvwHLYQ6LJ9BQcVZiKr95F6LsU889+bmM=; b=fgo3tC0lR/jlTwe+w9OLm2lzw9O3PHZHD2A7zV760xK28fbsqFJb+AmJfFDr1dxZgY 3P3qHQYM1kTOdw4Dhi0f3hO4Zp+AtYR5o/HmoWlVfNid0AW5tid6qK0hWtQMfAfJweW/ nBdgfMlOccaOTTerT/XUCR+AOXRkhG+K0J74VoXKLp3X3KOX4hA1IEpZxdmkRHqiOo1D ODJTu+EyaSxdufBBRNWvBQ+baD5mdwBc5JGTLAbkBTjS7Eq+ZVrIvgX2GUWpr96qz/7I D9zgaIce7XfehLDYhIqq/Q/2uTSHqX4MEviDXUfqJu10uXb1Ye5W/s2/oISQp2m4qQ9W dIJA== X-Forwarded-Encrypted: i=1; AJvYcCWefi9u12vihbw9cZJpXdUhUYT1RSaDu28oLWKC54BBNs1iaenXxCr72Nk8pNzB+K9T7oYb7zxjXAP6Ta8=@vger.kernel.org X-Gm-Message-State: AOJu0YzuL0XVAoby0IY3khxG7bjykKYYcRxpNuxrbWpNbPzOzr3fzZWF uailDBa0nvW77fOUiD9WjW2/Dy/YnU0EogJ/nAn/tc6EQA5dP1q7+kGIem02hs96TCg= X-Gm-Gg: AY/fxX42FTAs3X50XoQajFPWoC6UlmSJQ0wXwoHzH4Ev6NBmR5Rq9S6o2V3FrGO0oKS SQYxw6nvB9iLBb2mEyFk/uWbdUFMIKfixp6KNtqxVXfJx7VZIpKFmErtGVxy/II9EbqnTyilc9M oBmVw3qZNGwKDnnYOYTRHEgIed9YYqTQLgtUqAJ2FC2iAL4NLS2JdKME2oZIUhYYI8aH0Rgx4mb 9Bx6DlBy5DFDOMrpbke4Ydb9hONVazODTz7mhxhBI1u8GvMRJZ2Pk7fgaas6K/An+aQiAGUBfwx 27rxjhRxXbysQdPUa0m/EH4hoi5TWlvUr1sWl8p1G0tShX4GnEVSxu8rQzXKOKq30K6CYwu2mg0 AqB1N23NNOZHoEwY/qus2WuPKHaso82TZxQZk7KH2d/bhoIFWf+0E0H6XQQkVA1H0FtT8xVbHH1 IRZgfi9zfAXHg59L1LGHkMpeYE+e8GCw== X-Google-Smtp-Source: AGHT+IEPI+nTV1vhiglExDtSpRez29RRQ8Wci1gwL6qRWUlWpm/mQuVAr7V1HihJUsx0PLEtt8CIig== X-Received: by 2002:a05:6a00:3697:b0:7b2:2d85:ae53 with SMTP id d2e1a72fcca58-7f66744661fmr9423954b3a.8.1765789584199; Mon, 15 Dec 2025 01:06:24 -0800 (PST) Received: from localhost.localdomain ([114.231.217.195]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-7f4c5093a40sm11993160b3a.46.2025.12.15.01.06.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 15 Dec 2025 01:06:23 -0800 (PST) From: Vernon Yang X-Google-Original-From: Vernon Yang To: akpm@linux-foundation.org, david@kernel.org, lorenzo.stoakes@oracle.com Cc: ziy@nvidia.com, npache@redhat.com, baohua@kernel.org, lance.yang@linux.dev, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Vernon Yang Subject: [PATCH 3/4] mm: khugepaged: move mm to list tail when MADV_COLD/MADV_FREE Date: Mon, 15 Dec 2025 17:04:18 +0800 Message-ID: <20251215090419.174418-4-yanglincheng@kylinos.cn> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20251215090419.174418-1-yanglincheng@kylinos.cn> References: <20251215090419.174418-1-yanglincheng@kylinos.cn> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" For example, create three task: hot1 -> cold -> hot2. After all three task are created, each allocate memory 128MB. the hot1/hot2 task continuously access 128 MB memory, while the cold task only accesses its memory briefly andthen call madvise(MADV_COLD). However, khugepaged still prioritizes scanning the cold task and only scans the hot2 task after completing the scan of the cold task. So if the user has explicitly informed us via MADV_COLD/FREE that this memory is cold or will be freed, it is appropriate for khugepaged to scan it only at the latest possible moment, thereby avoiding unnecessary scan and collapse operations to reducing CPU wastage. Here are the performance test results: (Throughput bigger is better, other smaller is better) Testing on x86_64 machine: | task hot2 | without patch | with patch | delta | |---------------------|---------------|---------------|---------| | total accesses time | 3.14 sec | 2.92 sec | -7.01% | | cycles per access | 4.91 | 2.07 | -57.84% | | Throughput | 104.38 M/sec | 112.12 M/sec | +7.42% | | dTLB-load-misses | 288966432 | 1292908 | -99.55% | Testing on qemu-system-x86_64 -enable-kvm: | task hot2 | without patch | with patch | delta | |---------------------|---------------|---------------|---------| | total accesses time | 3.35 sec | 2.96 sec | -11.64% | | cycles per access | 7.23 | 2.12 | -70.68% | | Throughput | 97.88 M/sec | 110.76 M/sec | +13.16% | | dTLB-load-misses | 237406497 | 3189194 | -98.66% | Signed-off-by: Vernon Yang --- include/linux/khugepaged.h | 1 + mm/khugepaged.c | 14 ++++++++++++++ mm/madvise.c | 3 +++ 3 files changed, 18 insertions(+) diff --git a/include/linux/khugepaged.h b/include/linux/khugepaged.h index eb1946a70cff..726e99de84e9 100644 --- a/include/linux/khugepaged.h +++ b/include/linux/khugepaged.h @@ -15,6 +15,7 @@ extern void __khugepaged_enter(struct mm_struct *mm); extern void __khugepaged_exit(struct mm_struct *mm); extern void khugepaged_enter_vma(struct vm_area_struct *vma, vm_flags_t vm_flags); +void khugepaged_move_tail(struct mm_struct *mm); extern void khugepaged_min_free_kbytes_update(void); extern bool current_is_khugepaged(void); extern int collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long add= r, diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 1ec1af5be3c8..91836dda2015 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -468,6 +468,20 @@ void khugepaged_enter_vma(struct vm_area_struct *vma, } } =20 +void khugepaged_move_tail(struct mm_struct *mm) +{ + struct mm_slot *slot; + + if (!mm_flags_test(MMF_VM_HUGEPAGE, mm)) + return; + + spin_lock(&khugepaged_mm_lock); + slot =3D mm_slot_lookup(mm_slots_hash, mm); + if (slot && khugepaged_scan.mm_slot !=3D slot) + list_move_tail(&slot->mm_node, &khugepaged_scan.mm_head); + spin_unlock(&khugepaged_mm_lock); +} + void __khugepaged_exit(struct mm_struct *mm) { struct mm_slot *slot; diff --git a/mm/madvise.c b/mm/madvise.c index fb1c86e630b6..3f9ca7af2c82 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -608,6 +608,8 @@ static long madvise_cold(struct madvise_behavior *madv_= behavior) madvise_cold_page_range(&tlb, madv_behavior); tlb_finish_mmu(&tlb); =20 + khugepaged_move_tail(vma->vm_mm); + return 0; } =20 @@ -835,6 +837,7 @@ static int madvise_free_single_vma(struct madvise_behav= ior *madv_behavior) &walk_ops, tlb); tlb_end_vma(tlb, vma); mmu_notifier_invalidate_range_end(&range); + khugepaged_move_tail(mm); return 0; } =20 --=20 2.51.0