From nobody Mon Jun 8 07:21:48 2026 Received: from mail-pj1-f68.google.com (mail-pj1-f68.google.com [209.85.216.68]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 203621419A4 for ; Fri, 5 Jun 2026 03:10:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.68 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780629026; cv=none; b=FiH7GPaB5/FUHEhwobA5XAUwYYBnsuTEo5NhlyN1aHPKmH2b1NIsndjFpMGxicjSk/VdfvUtTcWm9rS8EOZelgd0errfvPrRTERtNecQAocOJWelBt/+3mQmKkfSGRqPER9MsVaTarLA2fYe+e9MaJqOHt1YacAcYmcdTSgtZQc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780629026; c=relaxed/simple; bh=t1mLICHhnLO6Jky0VScBSiBvl0tHQMm8najzZsWVzQM=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version; b=ODNDn/2Z6J2OOpEJUcgCSxmQR6ZbF3jSouqgcWY2d2T1I88A4zDWLfO+7as66nNmZTITDyRRdJ3nm1bewGpbMRPkfcuD8tBePu1Oh/Cy9qg0U2PvcCFhAkkkl4kzMNk5iNJdfYyJCnYxSju/YQQ7N92Ou34NqdFcc0ogKmYkFjE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=bZMC6khA; arc=none smtp.client-ip=209.85.216.68 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="bZMC6khA" Received: by mail-pj1-f68.google.com with SMTP id 98e67ed59e1d1-36c68964315so786814a91.2 for ; Thu, 04 Jun 2026 20:10:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1780629022; x=1781233822; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=1ZEmz9CjR/k1fq+15RyyqofRsMPk5Kd3zOMwojBu/vU=; b=bZMC6khAH9383EEsSLZQWxJNKeAIderYC6rSNpZKaaS3tQia09QMDGpuru9CzJ8grM X6qL16H7ha98TzJrdt4Iq2fProetQIv6cFjRePBzWI9Tp7LW8sldtbbyENsckC3aRYe9 I7/B3cTWE2CJOCho6lx4mYmrY6UN7oxgGO8bjyMeKn3ejFQ5evtAPOHP2dwb/FTjXHWX M0Ei0SM3gr44aXrhsz8wBSBJ0FV5N7jYCaLY7vBXxFmCj5nUPQnpHjf0KEHU48Nec7Fj TYNKITtMyoyV//uaJwmGh11GI60dC5sQPuiyRdDwtUNs2/t2zCG8qNF8RuYMZmwgIfQQ ifcw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1780629022; x=1781233822; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=1ZEmz9CjR/k1fq+15RyyqofRsMPk5Kd3zOMwojBu/vU=; b=n1mLkOU6kNNdN4/HcrPFfaKQ5yNHVxmQSvW9bRXJ6KxWn7wzpTU/iM3lMxmeG3T3oi 7zzQvmF5VRT3JkoIY6JLZZOs0sigovMwfKGxGSZATWZoDdze0Ba1thur26R5CEoCQjFs /VC/8kbHhKNZnFy7btlhDebXA4pVlekVMt0ML9hpKjDsc2L8jJ7RwMdvH7DsZWa4A+gZ 8zpbS2wDwVL/J7QSuqgW5+OFfxTf+QJKs20mwcW1y1Z+8HEFZ9CAYg44bjKTXq+mqcGM o3VIp8/ZwWr38N2jxymNCD8NFcLI73Zg78HJsfWeWktgOx8q8RyI64xT7IATGbsAclWF jZ1A== X-Forwarded-Encrypted: i=1; AFNElJ9diX6TUn78W+Af/ipDjbmRHn1V7SqvJR43R/9aFjeN/VIe5MZyC8O6V7L1cMe/TXWUwU/LZtJidwgc6D4=@vger.kernel.org X-Gm-Message-State: AOJu0Yyyduk+U53tT198zb66WS5ZgiRt3P8tWKtZUoGhfX+k/oqNZVHB sKNkoFn6XBjpTOhtWa9XVX5VEGGLO/AKLcISI8ZO0LLCGwM31RsWkbhP X-Gm-Gg: Acq92OE5smAhHqugaymzweDovxsFNbqxdnc5wJuDEi2kimaVyYvOgXR9b+zodhTilfB azEZfZXbARSoCKMOcf6qfl6NumQ+c4zKdxxrs2Oh5Vxy0qnFnBT0BuOcxTqFPuLaHMtfFNdlwT9 PMKV/4+4J8okJVyliCMd7ahtgCs5a03BuNSQYYwVqlyveApzwjuaYxweM3B+NiDjQt0V/Nta47z DOw8XDPoxTJRJFa8VdEhdX1oBGFhxvybcRslVUmcFgWT5Ck2/HbWohYv6WSr99HWYBtoHlNVWmf SgrJ5yMFLB5hTPiJUae6+iC6jiCRLe1f+I67zD80VqgVDZ4DzA6ulgpmbFup8LETnj06noXjSjk xxjKygBg/Xxl5aAhgbWS2CQJ9zRgwaFVan9aL7Oa7M79SaQFbi+I06rfM3aZlOjrW0LSI3b5e4M fVUqrWYzoNUdmL7jZBkZFej0xPL16DEmI= X-Received: by 2002:a17:90a:fc45:b0:36d:a510:f8eb with SMTP id 98e67ed59e1d1-370eeff1e46mr1595157a91.3.1780629022034; Thu, 04 Jun 2026 20:10:22 -0700 (PDT) Received: from kernel.. ([116.128.244.169]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-37133082519sm295006a91.1.2026.06.04.20.10.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 04 Jun 2026 20:10:21 -0700 (PDT) From: Kunwu Chan X-Google-Original-From: Kunwu Chan To: sj@kernel.org, akpm@linux-foundation.org, david@kernel.org, ljs@kernel.org, liam@infradead.org, vbabka@kernel.org, rppt@kernel.org, surenb@google.com, mhocko@suse.com, corbet@lwn.net, skhan@linuxfoundation.org Cc: damon@lists.linux.dev, linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, Kunwu Chan , Wang Lian Subject: [PATCH] Docs/damon: add TLB flush policy document Date: Fri, 5 Jun 2026 11:10:08 +0800 Message-ID: <20260605031008.397328-1-kunwu.chan@linux.dev> X-Mailer: git-send-email 2.43.0 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kunwu Chan DAMON avoids TLB flushes after clearing PTE Accessed bits for sampling. The overhead was measured and found significant [1]. Production workloads with large working sets flush TLB buffers naturally, so accuracy impact is negligible. On systems with large TLB buffers and small test workloads, stale TLB entries persist across sampling intervals and produce false negatives. This comes up repeatedly on the mailing list and in private inquiries [2][3]. Add a document on the design decision, trade-offs, test environment problems, and recommendations. Link: https://lore.kernel.org/20200403103059.12762-1-sjpark@amazon.com [1] Link: https://lore.kernel.org/20260117020731.226785-3-sj@kernel.org [2] Link: https://lore.kernel.org/all/20260526145034.91594-1-sj@kernel.org [3] Co-developed-by: Wang Lian Signed-off-by: Wang Lian Signed-off-by: Kunwu Chan --- Documentation/mm/damon/index.rst | 1 + Documentation/mm/damon/tlb_flush.rst | 131 +++++++++++++++++++++++++++ 2 files changed, 132 insertions(+) create mode 100644 Documentation/mm/damon/tlb_flush.rst diff --git a/Documentation/mm/damon/index.rst b/Documentation/mm/damon/inde= x.rst index 318f6a7bfea4..5e239437dab3 100644 --- a/Documentation/mm/damon/index.rst +++ b/Documentation/mm/damon/index.rst @@ -19,6 +19,7 @@ DAMON is a Linux kernel subsystem for efficient :ref:`dat= a access monitoring =20 faq design + tlb_flush api maintainer-profile =20 diff --git a/Documentation/mm/damon/tlb_flush.rst b/Documentation/mm/damon/= tlb_flush.rst new file mode 100644 index 000000000000..394f7b86102a --- /dev/null +++ b/Documentation/mm/damon/tlb_flush.rst @@ -0,0 +1,131 @@ +.. SPDX-License-Identifier: GPL-2.0 + +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D +DAMON TLB Flush Policy +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +:Author: Kunwu Chan +:Author: Wang Lian + +Overview +=3D=3D=3D=3D=3D=3D=3D=3D + +DAMON monitors data access by sampling PTE (Page Table Entry) Accessed bits +using ``ptep_test_and_clear_young()`` and ``pmdp_test_and_clear_young()``. +These functions clear the Accessed bit but do **not** flush the TLB +(Translation Lookaside Buffer). This is an intentional design choice. + +Questions about this behavior come up repeatedly, both on the mailing list +and in private inquiries. This document describes the reasoning, the +trade-offs, and recommendations for users and testers. + +Background +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +DAMON's access check works as follows: + +1. Clear the PTE Accessed bit for a sampled page. +2. Wait for one ``sampling interval``. +3. Check if the Accessed bit has been set again by the hardware. + +If the bit was set again, the page was accessed during the sampling interv= al. + +On architectures with hardware-managed TLB (e.g., x86, arm64), the CPU may +cache the Accessed bit state in the TLB. After DAMON clears the Accessed = bit +in the page table, a stale TLB entry with the old Accessed bit remains in = the +TLB. When the workload accesses the page, the access hits the stale TLB +entry and does not trigger a page table walk, so the Accessed bit in the p= age +table is not set again. DAMON therefore fails to detect real accesses on = its +next check, reporting false negatives. + +Flushing the TLB after clearing the Accessed bit prevents stale TLB entries +and eliminates this problem. Functions such as ``ptep_clear_flush_young()= `` and +``pmdp_clear_flush_young()`` provide this behavior. However, TLB flushes = come +at a performance cost. + +Why DAMON Does Not Flush TLB +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D + +DAMON intentionally avoids TLB flushes to keep monitoring overhead low. +The decision was made after measuring the performance impact of adding TLB +flushes to the sampling path. The measurement showed the overhead is +significant enough to matter for production use [1]_. + +Production workloads typically have large working sets that flush TLB buff= ers +anyway through normal memory access patterns. Stale TLB entries that could +cause monitoring inaccuracies are evicted by the workload's own memory act= ivity +before the next sampling interval. The accuracy impact is therefore negli= gible in +production. + +The following table summarizes the trade-off: + ++---------------------+-----------------------------+---------------------= ------+ +| | Without TLB Flush (current) | With TLB Flush = | ++---------------------+-----------------------------+---------------------= ------+ +| Monitoring Overhead | Low | Higher (flush cost) = | ++---------------------+-----------------------------+---------------------= ------+ +| Accuracy (prod) | Good | Good = | ++---------------------+-----------------------------+---------------------= ------+ +| Accuracy (test) | May degrade | Good = | ++---------------------+-----------------------------+---------------------= ------+ + +Impact on Testing and Small Workloads +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +The lack of TLB flush becomes problematic when the working set is small en= ough +to fit entirely within the TLB reach. This is common in test environments +and synthetic benchmarks. In such cases, stale TLB entries persist across +sampling intervals, so DAMON reports false accesses and monitoring results +become incorrect. + +For example, on a machine with a large TLB buffer, a test workload of a few +tens of megabytes may never experience TLB eviction. DAMON's WSS (Working= Set +Size) estimation can report 100% error (all regions reported as accessed, +or none reported as accessed depending on timing), and DAMOS schemes may n= ever +trigger correctly. + +This issue was observed in DAMON selftests and was addressed by increasing= the +test working set size to simulate production-like conditions, rather than +changing DAMON's TLB flush behavior [2]_. The selftest working set size w= as +increased up to 160 MiB for this reason. + +Recommendations +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +For Users +--------- + +If you observe unexpected ``nr_accesses`` values or inaccurate working +set size estimates, the cause is likely stale TLB entries from DAMON's +sampling without TLB flushes. This happens when the working set fits +within the TLB reach, which is uncommon for production workloads but can +occur with small workloads. See the For Testers section below for +how to verify this. + +For Testers and Developers +-------------------------- + +When writing DAMON tests, ensure the test workload's working set is large +enough to trigger natural TLB eviction on the target test machine. The +exact size depends on the CPU's TLB configuration. The DAMON selftest for +WSS estimation uses 160 MiB per region after finding smaller sizes +unreliable on systems with large TLB buffers [2]_. + +For out-of-tree tests, gradually increase the working set size until DAMON +reports stable and accurate results, then use that size as the baseline for +subsequent tests on the same hardware. + +If DAMON reports unexpectedly high ``nr_accesses`` or empty +``tried_regions``, the ``diagnose_empty_tried_regions.py`` script from +DAMON selftests can help determine whether stale TLB entries are the cause. + +The existing DAMON selftests follow this approach [2]_. + +References +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +.. [1] `DAMON TLB flush overhead measurement + `_ + +.. [2] `DAMON selftest: increase working set size for reliable results + `_ --=20 2.43.0