From nobody Mon Feb 9 15:10:06 2026 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1D3F11DE884 for ; Fri, 21 Mar 2025 17:37:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1742578654; cv=none; b=rJ7ukeTGO6MkL2xP0mjejkM0CjYRJglDQZBm3SU7MaFyLN+G3tD4EQsB6BVCotPKBIcIALg1irV8GTP1d5uzWEzJEpr6TDCPPuV1dZEHpWPvQx2IccYxsV0IP4Jl34SQlpp7G9nVajQJ9Gfc6G1TrTtcUoAWfLLGmoDSPhxWGoU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1742578654; c=relaxed/simple; bh=kuDJiYrASrg17oWfxRa+UXDWPQLL2nZttWFPUbMBMvE=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Content-Type; b=XqVVOcC936xjst5I9g3jwik1NPPg3y+j/3BsFvebsITjq2om3mlhi9iY2eSMJ+P6ly5fBJYfox/0zIqTvUgozJsCQbHp8A8dcY15nsP6lbcSiBD3EQo0gvd4JC+cNoAmOxjXgZUtIKmht9tWq4utfTUfc86jZUvDwluDyfQdLdg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--souravpanda.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=bwVfVYWZ; arc=none smtp.client-ip=209.85.216.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--souravpanda.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="bwVfVYWZ" Received: by mail-pj1-f74.google.com with SMTP id 98e67ed59e1d1-2ff7aecba07so3588076a91.2 for ; Fri, 21 Mar 2025 10:37:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1742578652; x=1743183452; darn=vger.kernel.org; h=to:from:subject:message-id:references:mime-version:in-reply-to:date :from:to:cc:subject:date:message-id:reply-to; bh=zIY5k7jRtzkaMHM23LwZQbgKSFYkNE5bEn5Jv/DROwA=; b=bwVfVYWZld82RLb9KSCkzCzGsQIpCRz7+nuaKdun/GjTxSkCliraxDqsyioHhbYpfk 9A83zA04UZQJVq2UlWQxn6OmgE/ChsZK1m+PcuIEYan0Tqic+u3tL6JGOGtkQeBgpuv3 xea7o30qENE1j1B29tPfOMrE3orCNcEFd1MKHS+pyb2zo2GLuTnET4GKoItq248WSh31 ZfTY6bQCqLXRrh9UIX0QGN4u+ShbXEeG9kTTVs6eHFomH/6Ko/CIDskwapDmDwxsOwmq m+sfAXaZ+lEv9vWNH5E8sFZQakJsFCP+Z/GO4mQaI+p0wvG5k34rej0IY5boijTZQ+4b ciug== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1742578652; x=1743183452; h=to:from:subject:message-id:references:mime-version:in-reply-to:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=zIY5k7jRtzkaMHM23LwZQbgKSFYkNE5bEn5Jv/DROwA=; b=s6GL7Ubhky+Ji6lCaV+LKQ8amBh9twd7JzJC0CnKYMMk4Ee+TnKl7aVNTuNTTHSzlE YAtu7s2ql83+VXYLefUxvA55kZ4QgMxhBX7X8TR8qFz/18sgVi/UbchS/AqZIIeiGUp3 D8AEGSeN4MEv3Qn/PLHs/rQRJiSdT5GYlJVoT/Xvy7oLTkMd0xC0hJPQp7Ey1b7zMDcS 3BskvKCWpjLyLAn9S9zjYNVA+SLfDpd8rGkYbbNQsHf2KVbzDYnyro1TQZm0hk80LIB7 DdhD3098XUSPYRV+nCdUo+3wtSkGrSzs5WV777iyYFVLldsAjHBiDMv2N8/0+oxUXFCm 8DLQ== X-Forwarded-Encrypted: i=1; AJvYcCU2kHUr9GdI4+mUjuZBTeIUuA5/6c0UZiP6zHsP8zrfoDB4vNuVrzCz2R0gzBRM7rsxyzjDLqz3LqQkPaM=@vger.kernel.org X-Gm-Message-State: AOJu0YxFht4VQ8ZdICvyj6A84f2ZcodByq0cLEwa4OGf7kkALdWlSGKd moiEwNcnGk8KXZhhn8v8DC3vyeTp/+r3z2QTuY90Dp8qg6ThyrmKPQuRcZ2PK2YdkQa13ZmPoYK MrTLY+OMfKL1clzXDfHIh/Q== X-Google-Smtp-Source: AGHT+IE7r52beD3Ab1/77IHPkao7u16u+m0qxZHCccy5P9AkxUY4reihApSQ91Wq7XViRQXL1VyGvF83vZYuIuNbfw== X-Received: from pjj14.prod.google.com ([2002:a17:90b:554e:b0:2f5:63a:4513]) (user=souravpanda job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:2707:b0:2ff:6488:e01c with SMTP id 98e67ed59e1d1-3030fefe3e4mr6880939a91.29.1742578652393; Fri, 21 Mar 2025 10:37:32 -0700 (PDT) Date: Fri, 21 Mar 2025 17:37:24 +0000 In-Reply-To: <20250321173729.3175898-1-souravpanda@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250321173729.3175898-1-souravpanda@google.com> X-Mailer: git-send-email 2.49.0.395.g12beb8f557-goog Message-ID: <20250321173729.3175898-2-souravpanda@google.com> Subject: [RFC PATCH 1/6] mm: introduce SELECTIVE_KSM KConfig From: Sourav Panda To: mathieu.desnoyers@efficios.com, willy@infradead.org, david@redhat.com, pasha.tatashin@soleen.com, rientjes@google.com, akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, weixugc@google.com, gthelen@google.com, souravpanda@google.com, surenb@google.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Gate the partitioned and synchronous features of SELECTIVE_KSM behind a KConfig. This shall prevent vanilla KSM's background thread from stepping over SELECTIVE_KSM. Signed-off-by: Sourav Panda --- mm/Kconfig | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/mm/Kconfig b/mm/Kconfig index 1b501db06417..f9873002414c 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -783,6 +783,17 @@ config KSM until a program has madvised that an area is MADV_MERGEABLE, and root has set /sys/kernel/mm/ksm/run to 1 (if CONFIG_SYSFS is set). =20 +config SELECTIVE_KSM + bool "Enable Selective KSM for page merging" + depends on KSM + help + Enable Synchronous and Partitioned KSM for page merging. There is + no background scanning. Instead, userspace specifies the pid + and address range to have merged. The partitioning aspect divides + the merge space into security domains. Merging of pages only takes + place within a partition, improving security. Furthermore, trees + in each partitioning becomes smaller, improving CPU efficiency. + config DEFAULT_MMAP_MIN_ADDR int "Low address space to protect from user allocation" depends on MMU --=20 2.49.0.395.g12beb8f557-goog From nobody Mon Feb 9 15:10:06 2026 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6C8AC22DFAB for ; Fri, 21 Mar 2025 17:37:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1742578656; cv=none; b=m2mpDtXFRjJROSPuYUzHBvDtrdVpKXzr/laRtX7e0uyuEoECMnRf99lrZ/V9PVvcLHpQZkYzRHxngGxdbwLfuoJWFlKGRI2RGmfTgv27qElEwXCTNjKdOYzUE6ARecMOMITqFJSyPj2FFV+IlQ8TEoP3DpyddVCJ9Im1oJHmFe0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1742578656; c=relaxed/simple; bh=Mwcr17jhQhHp1tM5uzIGNfIK5QpDNbH1adnC5f9s0YI=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Content-Type; b=fPECT4fTQMnYZFymm7YpSO1On+dj0ppBmAUw6DfoPMnx+nwUWU2gWiis7vgvBi/s7a39rNjiyYMvIQbnex9Cskxi8mW9ZcccYHD5t/Gfcn2hu/ZIrni/o5ug9+UnOzQedT+ak/bTYvzubEt8xsW/9Dq3nLMnpKhasnkqpnq6SAQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--souravpanda.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=T3F+bm3D; arc=none smtp.client-ip=209.85.216.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--souravpanda.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="T3F+bm3D" Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-2ff854a2541so3571094a91.0 for ; Fri, 21 Mar 2025 10:37:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1742578654; x=1743183454; darn=vger.kernel.org; h=to:from:subject:message-id:references:mime-version:in-reply-to:date :from:to:cc:subject:date:message-id:reply-to; bh=KD+/AzOE/8DE108iYCdGt2BwCB9Yw4lZbr5znCcmPD8=; b=T3F+bm3DV6F+Nm18o/lfh9K5Lvk5V9QIvk8xqX77UQQtH+fXDr7LRp2ypKE7IKkUw8 q7ycZbLQT95UqavJ/mAyRj1dWh0owMTw9iJV7pmvftatumz37clAJWZKL3xfiNkufUaI b8jNiwGzuiWGpRUVf2STfcUiTsonA/7hfZEAcrtJyPYASQeSCCDlXCmKi+u0XhT+JBdN jV8/V7YWg0u2UCvsd4JO1ijcixGituUyPJ3H8pDVzLLr2x/Ok3KwRY8h1hrxK+Cuyyvp STaW+v8qeZtPKzJ7Sz8RdVSGI+RPuNxwObNgGRAJhwN9kpzhS3u9Ele8QOJt2HrjwAgV xwqw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1742578654; x=1743183454; h=to:from:subject:message-id:references:mime-version:in-reply-to:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=KD+/AzOE/8DE108iYCdGt2BwCB9Yw4lZbr5znCcmPD8=; b=eF3k/Rpq37zw+HTmy6P0E8x1qWI4a7M7xa4kMzPLczFhzvI1jlR003FKoGmg6OIze9 E8UYtokJvwBKeTD/g0BFufA28l6K6QdWI5S4KJDo9JSIMp9/xdL1Sv7zhvKbM8XMp1Yg Qla0M1QEUq3BcHwnWYPS+Sj43x/1m2szliu5/7IFNecxr4YP1RPIvI5I+wWa88zxLgGs tbhN+9PdkRIm7kTZ5hJnkbIxY8dOYmsj73ubqP7E1i4KgXjmlo4r81YMd88p0scb5OFM 1EVlAwPxWnQam00pT2W/EHdC28Y0FOvQFoYqOMoqTtKeBimy4HMqSITtSF9qT8XqRX8f riGA== X-Forwarded-Encrypted: i=1; AJvYcCWqwNUP+XjVFcZ9n+FRyB3hwQOVhRDdRavaCCPtVuZJtS9lyyfsICMrIPY/awiomgEID5RJKgfQraQ3tuQ=@vger.kernel.org X-Gm-Message-State: AOJu0Yy2VhwDInWtd+E8tbZduJyZpopQR9jfhMsxHNFPXqk2Zs/7rzvZ b+NJZytby/lFPnMGNJq4uFgaa5v95MVUqZUJB7QSrGWhN8fjHaZMEOOJaPDQAO4ZUaFL1/k4APZ tKlusXmqyF8hrGFvsFFfcbw== X-Google-Smtp-Source: AGHT+IE3zfkzVNg9SOyEjNSGlTIDNM2s5rLvniscEkew+nzR6ifKeKx8uNXSoO7cH2YP2/p20l9xr1Homzd8hizqww== X-Received: from pjbee11.prod.google.com ([2002:a17:90a:fc4b:b0:2ea:46ed:5d3b]) (user=souravpanda job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:4b10:b0:2ff:796b:4d05 with SMTP id 98e67ed59e1d1-3030fea7630mr6796934a91.11.1742578653992; Fri, 21 Mar 2025 10:37:33 -0700 (PDT) Date: Fri, 21 Mar 2025 17:37:25 +0000 In-Reply-To: <20250321173729.3175898-1-souravpanda@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250321173729.3175898-1-souravpanda@google.com> X-Mailer: git-send-email 2.49.0.395.g12beb8f557-goog Message-ID: <20250321173729.3175898-3-souravpanda@google.com> Subject: [RFC PATCH 2/6] mm: make Selective KSM synchronous From: Sourav Panda To: mathieu.desnoyers@efficios.com, willy@infradead.org, david@redhat.com, pasha.tatashin@soleen.com, rientjes@google.com, akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, weixugc@google.com, gthelen@google.com, souravpanda@google.com, surenb@google.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Make KSM synchronous by introducing the following sysfs file, which shall carryout merging on the specified memory region synchronously and eliminates the need of ksmd running in the background. echo "pid start_addr end_addr" > /sys/kernel/mm/ksm/trigger_merge Signed-off-by: Sourav Panda --- mm/ksm.c | 317 +++++++++++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 271 insertions(+), 46 deletions(-) diff --git a/mm/ksm.c b/mm/ksm.c index 8be2b144fefd..b2f184557ed9 100644 --- a/mm/ksm.c +++ b/mm/ksm.c @@ -290,16 +290,18 @@ static unsigned int zero_checksum __read_mostly; /* Whether to merge empty (zeroed) pages with actual zero pages */ static bool ksm_use_zero_pages __read_mostly; =20 -/* Skip pages that couldn't be de-duplicated previously */ -/* Default to true at least temporarily, for testing */ -static bool ksm_smart_scan =3D true; - /* The number of zero pages which is placed by KSM */ atomic_long_t ksm_zero_pages =3D ATOMIC_LONG_INIT(0); =20 /* The number of pages that have been skipped due to "smart scanning" */ static unsigned long ksm_pages_skipped; =20 +#ifndef CONFIG_SELECTIVE_KSM /* advisor immaterial if there is no scanning= */ + +/* Skip pages that couldn't be de-duplicated previously */ +/* Default to true at least temporarily, for testing */ +static bool ksm_smart_scan =3D true; + /* Don't scan more than max pages per batch. */ static unsigned long ksm_advisor_max_pages_to_scan =3D 30000; =20 @@ -465,6 +467,7 @@ static void advisor_stop_scan(void) if (ksm_advisor =3D=3D KSM_ADVISOR_SCAN_TIME) scan_time_advisor(); } +#endif /* CONFIG_SELECTIVE_KSM */ =20 #ifdef CONFIG_NUMA /* Zeroed when merging across nodes is not allowed */ @@ -957,6 +960,25 @@ static struct folio *ksm_get_folio(struct ksm_stable_n= ode *stable_node, return NULL; } =20 +static unsigned char get_rmap_item_age(struct ksm_rmap_item *rmap_item) +{ +#ifdef CONFIG_SELECTIVE_KSM /* age is immaterial in selective ksm */ + return 0; +#else + unsigned char age; + /* + * Usually ksmd can and must skip the rb_erase, because + * root_unstable_tree was already reset to RB_ROOT. + * But be careful when an mm is exiting: do the rb_erase + * if this rmap_item was inserted by this scan, rather + * than left over from before. + */ + age =3D (unsigned char)(ksm_scan.seqnr - rmap_item->address); + WARN_ON_ONCE(age > 1); + return age; +#endif /* CONFIG_SELECTIVE_KSM */ +} + /* * Removing rmap_item from stable or unstable tree. * This function will clean the information from the stable/unstable tree. @@ -991,16 +1013,7 @@ static void remove_rmap_item_from_tree(struct ksm_rma= p_item *rmap_item) rmap_item->address &=3D PAGE_MASK; =20 } else if (rmap_item->address & UNSTABLE_FLAG) { - unsigned char age; - /* - * Usually ksmd can and must skip the rb_erase, because - * root_unstable_tree was already reset to RB_ROOT. - * But be careful when an mm is exiting: do the rb_erase - * if this rmap_item was inserted by this scan, rather - * than left over from before. - */ - age =3D (unsigned char)(ksm_scan.seqnr - rmap_item->address); - BUG_ON(age > 1); + unsigned char age =3D get_rmap_item_age(rmap_item); if (!age) rb_erase(&rmap_item->node, root_unstable_tree + NUMA(rmap_item->nid)); @@ -2203,6 +2216,37 @@ static void stable_tree_append(struct ksm_rmap_item = *rmap_item, rmap_item->mm->ksm_merging_pages++; } =20 +#ifdef CONFIG_SELECTIVE_KSM +static int update_checksum(struct page *page, struct ksm_rmap_item *rmap_i= tem) +{ + /* + * Typically KSM would wait for a second round to even consider + * the page for unstable tree insertion to ascertain its stability. + * Avoid this when using selective ksm. + */ + rmap_item->oldchecksum =3D calc_checksum(page); + return 0; +} +#else +static int update_checksum(struct page *page, struct ksm_rmap_item *rmap_i= tem) +{ + remove_rmap_item_from_tree(rmap_item); + + /* + * If the hash value of the page has changed from the last time + * we calculated it, this page is changing frequently: therefore we + * don't want to insert it in the unstable tree, and we don't want + * to waste our time searching for something identical to it there. + */ + checksum =3D calc_checksum(page); + if (rmap_item->oldchecksum !=3D checksum) { + rmap_item->oldchecksum =3D checksum; + return -EINVAL; + } + return 0; +} +#endif + /* * cmp_and_merge_page - first see if page can be merged into the stable tr= ee; * if not, compare checksum to previous and if it's the same, see if page = can @@ -2218,7 +2262,6 @@ static void cmp_and_merge_page(struct page *page, str= uct ksm_rmap_item *rmap_ite struct page *tree_page =3D NULL; struct ksm_stable_node *stable_node; struct folio *kfolio; - unsigned int checksum; int err; bool max_page_sharing_bypass =3D false; =20 @@ -2241,20 +2284,8 @@ static void cmp_and_merge_page(struct page *page, st= ruct ksm_rmap_item *rmap_ite if (!is_page_sharing_candidate(stable_node)) max_page_sharing_bypass =3D true; } else { - remove_rmap_item_from_tree(rmap_item); - - /* - * If the hash value of the page has changed from the last time - * we calculated it, this page is changing frequently: therefore we - * don't want to insert it in the unstable tree, and we don't want - * to waste our time searching for something identical to it there. - */ - checksum =3D calc_checksum(page); - if (rmap_item->oldchecksum !=3D checksum) { - rmap_item->oldchecksum =3D checksum; + if (update_checksum(page, rmap_item)) return; - } - if (!try_to_merge_with_zero_page(rmap_item, page)) return; } @@ -2379,6 +2410,111 @@ static struct ksm_rmap_item *get_next_rmap_item(str= uct ksm_mm_slot *mm_slot, return rmap_item; } =20 +#ifdef CONFIG_SELECTIVE_KSM +static struct ksm_rmap_item *retrieve_rmap_item(struct page **page, + struct mm_struct *mm, + unsigned long start, + unsigned long end) +{ + struct ksm_mm_slot *mm_slot; + struct mm_slot *slot; + struct vm_area_struct *vma; + struct ksm_rmap_item *rmap_item; + struct vma_iterator vmi; + + lru_add_drain_all(); + + if (!ksm_merge_across_nodes) { + struct ksm_stable_node *stable_node, *next; + struct folio *folio; + + list_for_each_entry_safe(stable_node, next, + &migrate_nodes, list) { + folio =3D ksm_get_folio(stable_node, KSM_GET_FOLIO_NOLOCK); + if (folio) + folio_put(folio); + } + } + + spin_lock(&ksm_mmlist_lock); + slot =3D mm_slot_lookup(mm_slots_hash, mm); + spin_unlock(&ksm_mmlist_lock); + + if (!slot) + return NULL; + mm_slot =3D mm_slot_entry(slot, struct ksm_mm_slot, slot); + + ksm_scan.address =3D 0; + ksm_scan.mm_slot =3D mm_slot; + ksm_scan.rmap_list =3D &mm_slot->rmap_list; + + vma_iter_init(&vmi, mm, ksm_scan.address); + + mmap_read_lock(mm); + for_each_vma(vmi, vma) { + if (!(vma->vm_flags & VM_MERGEABLE)) + continue; + if (ksm_scan.address < vma->vm_start) + ksm_scan.address =3D vma->vm_start; + if (!vma->anon_vma) + ksm_scan.address =3D vma->vm_end; + + while (ksm_scan.address < vma->vm_end) { + struct page *tmp_page =3D NULL; + struct folio_walk fw; + struct folio *folio; + + if (ksm_scan.address < start || ksm_scan.address > end) + break; + + folio =3D folio_walk_start(&fw, vma, ksm_scan.address, 0); + if (folio) { + if (!folio_is_zone_device(folio) && + folio_test_anon(folio)) { + folio_get(folio); + tmp_page =3D fw.page; + } + folio_walk_end(&fw, vma); + } + + if (tmp_page) { + flush_anon_page(vma, tmp_page, ksm_scan.address); + flush_dcache_page(tmp_page); + rmap_item =3D get_next_rmap_item(mm_slot, + ksm_scan.rmap_list, + ksm_scan.address); + if (rmap_item) { + ksm_scan.rmap_list =3D + &rmap_item->rmap_list; + ksm_scan.address +=3D PAGE_SIZE; + *page =3D tmp_page; + } else { + folio_put(folio); + } + mmap_read_unlock(mm); + return rmap_item; + } + ksm_scan.address +=3D PAGE_SIZE; + } + } + mmap_read_unlock(mm); + return NULL; +} + +static void ksm_sync_merge(struct mm_struct *mm, + unsigned long start, unsigned long end) +{ + struct ksm_rmap_item *rmap_item; + struct page *page; + + rmap_item =3D retrieve_rmap_item(&page, mm, start, end); + if (!rmap_item) + return; + cmp_and_merge_page(page, rmap_item); + put_page(page); +} + +#else /* CONFIG_SELECTIVE_KSM */ /* * Calculate skip age for the ksm page age. The age determines how often * de-duplicating has already been tried unsuccessfully. If the age is @@ -2688,6 +2824,7 @@ static int ksm_scan_thread(void *nothing) } return 0; } +#endif /* CONFIG_SELECTIVE_KSM */ =20 static void __ksm_add_vma(struct vm_area_struct *vma) { @@ -3335,9 +3472,10 @@ static ssize_t pages_to_scan_store(struct kobject *k= obj, unsigned int nr_pages; int err; =20 +#ifndef CONFIG_SELECTIVE_KSM if (ksm_advisor !=3D KSM_ADVISOR_NONE) return -EINVAL; - +#endif err =3D kstrtouint(buf, 10, &nr_pages); if (err) return -EINVAL; @@ -3396,6 +3534,65 @@ static ssize_t run_store(struct kobject *kobj, struc= t kobj_attribute *attr, } KSM_ATTR(run); =20 +static ssize_t trigger_merge_show(struct kobject *kobj, + struct kobj_attribute *attr, + char *buf) +{ + return -EINVAL; /* Not yet implemented */ +} + +static ssize_t trigger_merge_store(struct kobject *kobj, + struct kobj_attribute *attr, + const char *buf, size_t count) +{ + unsigned long start, end; + pid_t pid; + char *input, *ptr; + int ret; + struct task_struct *task; + struct mm_struct *mm; + + input =3D kstrdup(buf, GFP_KERNEL); + if (!input) + return -ENOMEM; + + ptr =3D strim(input); + ret =3D sscanf(ptr, "%d %lx %lx", &pid, &start, &end); + kfree(input); + + if (ret !=3D 3) + return -EINVAL; + + if (start >=3D end) + return -EINVAL; + + /* Find the mm_struct */ + rcu_read_lock(); + task =3D find_task_by_vpid(pid); + if (!task) { + rcu_read_unlock(); + return -ESRCH; + } + + get_task_struct(task); + + rcu_read_unlock(); + mm =3D get_task_mm(task); + put_task_struct(task); + + if (!mm) + return -EINVAL; + + mutex_lock(&ksm_thread_mutex); + wait_while_offlining(); + ksm_sync_merge(mm, start, end); + mutex_unlock(&ksm_thread_mutex); + + mmput(mm); + return count; +} +KSM_ATTR(trigger_merge); + #ifdef CONFIG_NUMA static ssize_t merge_across_nodes_show(struct kobject *kobj, struct kobj_attribute *attr, char *buf) @@ -3635,6 +3832,7 @@ static ssize_t full_scans_show(struct kobject *kobj, } KSM_ATTR_RO(full_scans); =20 +#ifndef CONFIG_SELECTIVE_KSM static ssize_t smart_scan_show(struct kobject *kobj, struct kobj_attribute *attr, char *buf) { @@ -3780,11 +3978,13 @@ static ssize_t advisor_target_scan_time_store(struc= t kobject *kobj, return count; } KSM_ATTR(advisor_target_scan_time); +#endif /* CONFIG_SELECTIVE_KSM */ =20 static struct attribute *ksm_attrs[] =3D { &sleep_millisecs_attr.attr, &pages_to_scan_attr.attr, &run_attr.attr, + &trigger_merge_attr.attr, &pages_scanned_attr.attr, &pages_shared_attr.attr, &pages_sharing_attr.attr, @@ -3802,12 +4002,14 @@ static struct attribute *ksm_attrs[] =3D { &stable_node_chains_prune_millisecs_attr.attr, &use_zero_pages_attr.attr, &general_profit_attr.attr, +#ifndef CONFIG_SELECTIVE_KSM &smart_scan_attr.attr, &advisor_mode_attr.attr, &advisor_max_cpu_attr.attr, &advisor_min_pages_to_scan_attr.attr, &advisor_max_pages_to_scan_attr.attr, &advisor_target_scan_time_attr.attr, +#endif NULL, }; =20 @@ -3815,40 +4017,63 @@ static const struct attribute_group ksm_attr_group = =3D { .attrs =3D ksm_attrs, .name =3D "ksm", }; + +static int __init ksm_sysfs_init(void) +{ + return sysfs_create_group(mm_kobj, &ksm_attr_group); +} +#else /* CONFIG_SYSFS */ +static int __init ksm_sysfs_init(void) +{ + ksm_run =3D KSM_RUN_MERGE; /* no way for user to start it */ + return 0; +} #endif /* CONFIG_SYSFS */ =20 -static int __init ksm_init(void) +#ifdef CONFIG_SELECTIVE_KSM +static int __init ksm_thread_sysfs_init(void) +{ + return ksm_sysfs_init(); +} +#else /* CONFIG_SELECTIVE_KSM */ +static int __init ksm_thread_sysfs_init(void) { struct task_struct *ksm_thread; int err; =20 - /* The correct value depends on page size and endianness */ - zero_checksum =3D calc_checksum(ZERO_PAGE(0)); - /* Default to false for backwards compatibility */ - ksm_use_zero_pages =3D false; - - err =3D ksm_slab_init(); - if (err) - goto out; - ksm_thread =3D kthread_run(ksm_scan_thread, NULL, "ksmd"); if (IS_ERR(ksm_thread)) { pr_err("ksm: creating kthread failed\n"); err =3D PTR_ERR(ksm_thread); - goto out_free; + return err; } =20 -#ifdef CONFIG_SYSFS - err =3D sysfs_create_group(mm_kobj, &ksm_attr_group); + err =3D ksm_sysfs_init(); if (err) { pr_err("ksm: register sysfs failed\n"); kthread_stop(ksm_thread); - goto out_free; } -#else - ksm_run =3D KSM_RUN_MERGE; /* no way for user to start it */ =20 -#endif /* CONFIG_SYSFS */ + return err; +} +#endif /* CONFIG_SELECTIVE_KSM */ + +static int __init ksm_init(void) +{ + int err; + + /* The correct value depends on page size and endianness */ + zero_checksum =3D calc_checksum(ZERO_PAGE(0)); + /* Default to false for backwards compatibility */ + ksm_use_zero_pages =3D false; + + err =3D ksm_slab_init(); + if (err) + goto out; + + err =3D ksm_thread_sysfs_init(); + if (err) + goto out_free; =20 #ifdef CONFIG_MEMORY_HOTREMOVE /* There is no significance to this priority 100 */ --=20 2.49.0.395.g12beb8f557-goog From nobody Mon Feb 9 15:10:06 2026 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4297522F3BD for ; Fri, 21 Mar 2025 17:37:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1742578660; cv=none; b=oMvpxbnKXl/BjaWHiTLI/ePrPksNTS6H1YhY6ieZsFcNJbDEAcdY2/Vq5rvhmf1EREHlQvvnNevv4KGa2RWI1VXbMu5QjT684cbudQlRL3hbKg7Ii5RlVZH3vUNu+jRt6Yo/cYdkwA7uf5tnUtKsGBauLfnD7TWRB+Hh41uJe38= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1742578660; c=relaxed/simple; bh=9t2nY0/hLkwQPwl07fwQz0gTqSMVr2cWsh7a7QESHxo=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Content-Type; b=IAhY9EOzka6ZtxAUBcK4l4rTbi/GmHaUvje+I0e74sD2FReyeyd3p9W4Wxa1ZaV/r2VU/+WdHiZad+kzGx9SwLobc1cqfIorWDBeZrYCi7flvN5csIJ0Q7esQ8I0xt8EnqwhoeqDnuJ1hMlTxNBRgvhefafpfgBcu/Y7HSUkS78= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--souravpanda.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=h69hiyIf; arc=none smtp.client-ip=209.85.216.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--souravpanda.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="h69hiyIf" Received: by mail-pj1-f74.google.com with SMTP id 98e67ed59e1d1-2ff854a2541so3571196a91.0 for ; Fri, 21 Mar 2025 10:37:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1742578659; x=1743183459; darn=vger.kernel.org; h=to:from:subject:message-id:references:mime-version:in-reply-to:date :from:to:cc:subject:date:message-id:reply-to; bh=JhyKuUbIG2qUFhxjEu6Kg0kSJWlrJ10xlnpfsY0CXs8=; b=h69hiyIf8+dbSMlhG6Yu6cmF0bmaXuGxibvRbbR2vOP+IaJO33JbYULoHkiyEPY6WP kO4gl2YPCJaMaJKWTR53D3Vn4o6fWWRYEqwbb3ahNCsmCjB9r+XowWDBeiD3Uc+vFnrF eXvwOi7Y1P1Qmb+CTZvbmg6i1qyIwYhH5TY7WWy0F4rfpM+MqnZ8jSaUztUmybmlo53r llEEvv0kp98EhxRzjMjrWTyk7Jd+SQ2b5rlIQ9fU4H5U7zwPMo/RMt+JvLIXyiMouWgN DpwQJwbeFiIWkzEbxiva6NKIWTDLF5NXANfUSmjehrwySG2HpvAtoqJiyIykK2zV7p1J nPpw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1742578659; x=1743183459; h=to:from:subject:message-id:references:mime-version:in-reply-to:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=JhyKuUbIG2qUFhxjEu6Kg0kSJWlrJ10xlnpfsY0CXs8=; b=DdMx32pj095r8LjEF+ciN73/4CW9pbW6aZ2NbHUmNfT3cpCGZBwC5y7yLHfIFtjUcl qfct53KHC/fUMBvrU9ENlXHNf8azWAeiD4tSOJklXvVLMYzTtNhobMhp6RYBEuDdTm57 fVDnE2dSVEApyw220Q/6E64vW7fQSVU3mNqkT8hAA49iVyJ4JN6TIuSs9z+RAXjaABUX UemkDGQCd2vS5u+hNr2BTdvWB991OQhHtVLRbOIkalgrOjjda8tymctl8YnM36GD+XIk cJmwVgWTkNYGCoc6+UJDqawF15P7bUpmrbrjUGZ7PMZFGks4Yj5ZOBs1/PsLDUDmF3qd RhYA== X-Forwarded-Encrypted: i=1; AJvYcCVOcLLFj/V1Rj+89RrXhWH5fxOD6UOrK7cE+G+6Dom1HARzouCiLVc3XJn9Ya1tUiyp9sywXsEl2wxmjD4=@vger.kernel.org X-Gm-Message-State: AOJu0YzpXXDDwMo+LUNvBJ2kr2KKtH/8/ftUUePxhX8K3D85g3E3x5Wj eRsbqt8SwMZtS62dZOATpxUm5R55Hnf/6+2Apu7a/rzez5aPlIK+lMFJloF7NgHsSP3R+UDvEGV asSkPFlwhNPFGjySx6PQeUw== X-Google-Smtp-Source: AGHT+IErW6hIRZXrutBcTb+9kRL8XmrfeSil0yg5cL4+Y+ZPuvGKGUahbLis4vVKFNz4GDCbx8eM7VDHYQJtpswc8A== X-Received: from pgar25.prod.google.com ([2002:a05:6a02:2e99:b0:ad7:adb7:8c14]) (user=souravpanda job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a20:1582:b0:1f3:2e85:c052 with SMTP id adf61e73a8af0-1fe4330249cmr6556407637.35.1742578658785; Fri, 21 Mar 2025 10:37:38 -0700 (PDT) Date: Fri, 21 Mar 2025 17:37:26 +0000 In-Reply-To: <20250321173729.3175898-1-souravpanda@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250321173729.3175898-1-souravpanda@google.com> X-Mailer: git-send-email 2.49.0.395.g12beb8f557-goog Message-ID: <20250321173729.3175898-4-souravpanda@google.com> Subject: [RFC PATCH 3/6] mm: make Selective KSM partitioned From: Sourav Panda To: mathieu.desnoyers@efficios.com, willy@infradead.org, david@redhat.com, pasha.tatashin@soleen.com, rientjes@google.com, akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, weixugc@google.com, gthelen@google.com, souravpanda@google.com, surenb@google.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Create a sysfs interface to partition the KSM merge space. We add a new sysfs file, namely add_partition. Which is used to specify the name of the new partition. Once a partition is created, we would get the traditional files typcally available in KSM under each partition. This sysfs interface changes are in preparation of the following patch that shall actually partition the merge space (e.g., prevent page-comparison and merging across partitions). KSM_SYSFS=3D/sys/kernel/mm/ksm echo "part_1" > ${KSM_SYSFS}/ksm/control/add_partition ls ${KSM_SYSFS}/part_1/ pages_scanned pages_to_scan sleep_millisecs ... echo "pid start_addr end_addr" > ${KSM_SYSFS}/part_1/trigger_merge Signed-off-by: Sourav Panda --- mm/ksm.c | 101 +++++++++++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 95 insertions(+), 6 deletions(-) diff --git a/mm/ksm.c b/mm/ksm.c index b2f184557ed9..927e257c48b5 100644 --- a/mm/ksm.c +++ b/mm/ksm.c @@ -3832,7 +3832,17 @@ static ssize_t full_scans_show(struct kobject *kobj, } KSM_ATTR_RO(full_scans); =20 -#ifndef CONFIG_SELECTIVE_KSM +#ifdef CONFIG_SELECTIVE_KSM +static struct kobject *ksm_base_kobj; + +struct partition_kobj { + struct kobject *kobj; + struct list_head list; +}; + +static LIST_HEAD(partition_list); + +#else /* CONFIG_SELECTIVE_KSM */ static ssize_t smart_scan_show(struct kobject *kobj, struct kobj_attribute *attr, char *buf) { @@ -4015,15 +4025,22 @@ static struct attribute *ksm_attrs[] =3D { =20 static const struct attribute_group ksm_attr_group =3D { .attrs =3D ksm_attrs, +#ifndef CONFIG_SELECTIVE_KSM .name =3D "ksm", +#endif }; =20 -static int __init ksm_sysfs_init(void) +static int __init ksm_sysfs_init(struct kobject *kobj, + const struct attribute_group *grp) { - return sysfs_create_group(mm_kobj, &ksm_attr_group); + int err; + + err =3D sysfs_create_group(kobj, grp); + return err; } #else /* CONFIG_SYSFS */ -static int __init ksm_sysfs_init(void) +static int __init ksm_sysfs_init(struct kobject *kobj, + const struct attribute_group *grp) { ksm_run =3D KSM_RUN_MERGE; /* no way for user to start it */ return 0; @@ -4031,9 +4048,81 @@ static int __init ksm_sysfs_init(void) #endif /* CONFIG_SYSFS */ =20 #ifdef CONFIG_SELECTIVE_KSM +static ssize_t add_partition_store(struct kobject *kobj, + struct kobj_attribute *attr, + const char *buf, size_t count) +{ + struct partition_kobj *new_partition_kobj; + char partition_name[50]; + int err; + + mutex_lock(&ksm_thread_mutex); + + if (count >=3D sizeof(partition_name)) { + err =3D -EINVAL; /* Prevent buffer overflow */ + goto unlock; + } + + snprintf(partition_name, sizeof(partition_name), + "%.*s", (int)(count - 1), buf); /* Remove newline */ + + /* Allocate memory for new dynamic kobject entry */ + new_partition_kobj =3D kmalloc(sizeof(*new_partition_kobj), GFP_KERNEL); + if (!new_partition_kobj) { + err =3D -ENOMEM; + goto unlock; + } + + new_partition_kobj->kobj =3D kobject_create_and_add(partition_name, + ksm_base_kobj); + if (!new_partition_kobj) { + kfree(new_partition_kobj); + err =3D -ENOMEM; + goto unlock; + } + + err =3D sysfs_create_group(new_partition_kobj->kobj, &ksm_attr_group); + if (err) { + pr_err("ksm: register sysfs failed\n"); + kfree(new_partition_kobj); + err =3D -ENOMEM; + goto unlock; + } + + list_add(&new_partition_kobj->list, &partition_list); + +unlock: + mutex_unlock(&ksm_thread_mutex); + return err ? err : count; +} + +static struct kobj_attribute add_kobj_attr =3D __ATTR(add_partition, 0220,= NULL, + add_partition_store); + +/* Array of attributes for base kobject */ +static struct attribute *ksm_base_attrs[] =3D { + &add_kobj_attr.attr, + NULL, /* NULL-terminated */ +}; + +/* Attribute group for base kobject */ +static struct attribute_group ksm_base_attr_group =3D { + .name =3D "control", + .attrs =3D ksm_base_attrs, +}; + static int __init ksm_thread_sysfs_init(void) { - return ksm_sysfs_init(); + int err; + + ksm_base_kobj =3D kobject_create_and_add("ksm", mm_kobj); + if (!ksm_base_kobj) { + err =3D -ENOMEM; + return err; + } + + err =3D ksm_sysfs_init(ksm_base_kobj, &ksm_base_attr_group); + return err; } #else /* CONFIG_SELECTIVE_KSM */ static int __init ksm_thread_sysfs_init(void) @@ -4048,7 +4137,7 @@ static int __init ksm_thread_sysfs_init(void) return err; } =20 - err =3D ksm_sysfs_init(); + err =3D ksm_sysfs_init(mm_kobj, &ksm_attr_group); if (err) { pr_err("ksm: register sysfs failed\n"); kthread_stop(ksm_thread); --=20 2.49.0.395.g12beb8f557-goog From nobody Mon Feb 9 15:10:06 2026 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0E6EE22FDE8 for ; Fri, 21 Mar 2025 17:37:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1742578662; cv=none; b=KZ4cG4JmisCAQZsTiUOPu9+7PpY/HAJQo8Z7GXeGyQjg8cxcK6Xy+5/EMqif0DcHKBeE4SQadqV7kaTg7Ddb00s2GUO0A1l77+bopXNRLtoQGHIS012CcuWKXW9sUd5V8W402Ij3lBuy9hXV/6MvbAy78Vc9fUCsFuM2vOcDFOg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1742578662; c=relaxed/simple; bh=xG0wWXKoZAwJPsxLwom/7j0KkgqbNPFbSm+fWCi52dc=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Content-Type; b=nnjH7BWxSNoawpxGiEy3/G4vJ/uGz7ivdNAYCxUrpgPxAd9KGYDcKeCD2FKp4IUfndvYj0iYz+7sX4e2TW0o/BicLj3z7VUP8fljPPVOhiz3carGIPNi8l0/reqGEO59QxPXhKBaHY7jwQ/ifY8Se7D513nq6zd/FsI5iHluyTo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--souravpanda.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=JnZ0WOkd; arc=none smtp.client-ip=209.85.216.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--souravpanda.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="JnZ0WOkd" Received: by mail-pj1-f74.google.com with SMTP id 98e67ed59e1d1-3011bee1751so3467474a91.1 for ; Fri, 21 Mar 2025 10:37:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1742578660; x=1743183460; darn=vger.kernel.org; h=to:from:subject:message-id:references:mime-version:in-reply-to:date :from:to:cc:subject:date:message-id:reply-to; bh=w+86G2SUYfeiHLweFc9OnEaVmlu/kSLmPHY1T1dM0HE=; b=JnZ0WOkdWtuYnAhAXvjfjcgYoZxoBxQqxO1VpopP0O/Qa5L/96vkROi6Bu2znETjK4 FEiWmwPDmgBvZCGk+BLbe2lvZI+tpDLajHOB8x9gE4KsUAsSfKQntQv/q9shrx16lOPM gIeHVuCDcfqUB4oGdjESHDT398iD5Vbw0rHGaVmJwALxWjQH1Rwq7ZE0N0JswL6NBcYc d5ieQDp6aMZXrbTPFb9a1S4JRrlv9ra+8ie7SnhKXrnWkShLlWJq7RnAI+lRgXi7bFaN Q+wM3yk89BaXPm8/FoatDuYAnhhpHVMd6ueRdbNB2o57bkh2TVVBhFqauwjHI1haqdLb r0Og== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1742578660; x=1743183460; h=to:from:subject:message-id:references:mime-version:in-reply-to:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=w+86G2SUYfeiHLweFc9OnEaVmlu/kSLmPHY1T1dM0HE=; b=l90UO91c3F3lNA/43TGxboB5VscSOkjmssuO3RI49+oj4323La6F0ee2pT8qcQgfn8 OsUkXLhkX43UVyyxs++zIcE+FTEZPRfwg0OB7aLWjvNnVpqfAv89/pMB0C4+chc8VYbc WSCwfFKXnHKRKtAtNA/vn/btAqIGWXdo5erdZSauI4CV6gSrekem6ufFzcFuZE0Fh6YM wYJBDATKZGfA7mQCWDz2O8kU3ovtxiR45LbxUNfTxSuXNxHu5WB3tEsKfW543SwUWy/l 6YLOOYA73D5/31fnbsjmyCZf91cq44ER7xTnUWwwvUXc29GWEQDefgU59MKFbQvmcoHO 77Vg== X-Forwarded-Encrypted: i=1; AJvYcCXt/XHL2uQ//4uq1uWZp9LiqKfZh6HODgKkXamrwb4WUdYdtBa0JwUSCSmpmh0mv5u7ahgU5Htr695l3fE=@vger.kernel.org X-Gm-Message-State: AOJu0YyGGOSEVAy+f+T3XQKtJDu2g1YrOZilA42tucu8xy6WwcZjPYcL kshp3LkC9cntgMPsq0I8J/S8sPrMTloFDeFuvOC8PRvwehNckb23mVS8MjOCiRUDZV7isdlLsLS jqFME9JRxYf/ahZGkZz98Sg== X-Google-Smtp-Source: AGHT+IGm4f7njleqOWwtfaudXh+F/G/hZodt/w5YdBSQxu8dXfprXgYhidXkVGY03bHplMdrG0x4phFmYcftkI/tzg== X-Received: from pjtd12.prod.google.com ([2002:a17:90b:4c:b0:2fa:27e2:a64d]) (user=souravpanda job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:5450:b0:2ff:5714:6a with SMTP id 98e67ed59e1d1-3030fe98134mr5881396a91.19.1742578660362; Fri, 21 Mar 2025 10:37:40 -0700 (PDT) Date: Fri, 21 Mar 2025 17:37:27 +0000 In-Reply-To: <20250321173729.3175898-1-souravpanda@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250321173729.3175898-1-souravpanda@google.com> X-Mailer: git-send-email 2.49.0.395.g12beb8f557-goog Message-ID: <20250321173729.3175898-5-souravpanda@google.com> Subject: [RFC PATCH 4/6] mm: create dedicated trees for SELECTIVE KSM partitions From: Sourav Panda To: mathieu.desnoyers@efficios.com, willy@infradead.org, david@redhat.com, pasha.tatashin@soleen.com, rientjes@google.com, akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, weixugc@google.com, gthelen@google.com, souravpanda@google.com, surenb@google.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Extend ksm to create dedicated unstable and stable trees for each partition. Signed-off-by: Sourav Panda --- mm/ksm.c | 165 +++++++++++++++++++++++++++++++++++++------------------ 1 file changed, 111 insertions(+), 54 deletions(-) diff --git a/mm/ksm.c b/mm/ksm.c index 927e257c48b5..b575250aaf45 100644 --- a/mm/ksm.c +++ b/mm/ksm.c @@ -144,6 +144,28 @@ struct ksm_scan { unsigned long seqnr; }; =20 +static struct kobject *ksm_base_kobj; + +struct partition_kobj { + struct kobject *kobj; + struct list_head list; + struct rb_root *root_stable_tree; + struct rb_root *root_unstable_tree; +}; + +static LIST_HEAD(partition_list); + +static struct partition_kobj *find_partition_by_kobj(struct kobject *kobj) +{ + struct partition_kobj *partition; + + list_for_each_entry(partition, &partition_list, list) { + if (partition->kobj =3D=3D kobj) + return partition; + } + return NULL; +} + /** * struct ksm_stable_node - node of the stable rbtree * @node: rb node of this ksm page in the stable tree @@ -182,6 +204,7 @@ struct ksm_stable_node { #ifdef CONFIG_NUMA int nid; #endif + struct partition_kobj *partition; }; =20 /** @@ -218,6 +241,7 @@ struct ksm_rmap_item { struct hlist_node hlist; }; }; + struct partition_kobj *partition; }; =20 #define SEQNR_MASK 0x0ff /* low bits of unstable tree seqnr */ @@ -227,8 +251,6 @@ struct ksm_rmap_item { /* The stable and unstable tree heads */ static struct rb_root one_stable_tree[1] =3D { RB_ROOT }; static struct rb_root one_unstable_tree[1] =3D { RB_ROOT }; -static struct rb_root *root_stable_tree =3D one_stable_tree; -static struct rb_root *root_unstable_tree =3D one_unstable_tree; =20 /* Recently migrated nodes of stable tree, pending proper placement */ static LIST_HEAD(migrate_nodes); @@ -555,7 +577,7 @@ static inline void stable_node_dup_del(struct ksm_stabl= e_node *dup) if (is_stable_node_dup(dup)) __stable_node_dup_del(dup); else - rb_erase(&dup->node, root_stable_tree + NUMA(dup->nid)); + rb_erase(&dup->node, dup->partition->root_stable_tree + NUMA(dup->nid)); #ifdef CONFIG_DEBUG_VM dup->head =3D NULL; #endif @@ -580,14 +602,20 @@ static inline void free_rmap_item(struct ksm_rmap_ite= m *rmap_item) kmem_cache_free(rmap_item_cache, rmap_item); } =20 -static inline struct ksm_stable_node *alloc_stable_node(void) +static inline struct ksm_stable_node *alloc_stable_node(struct partition_k= obj *partition) { /* * The allocation can take too long with GFP_KERNEL when memory is under * pressure, which may lead to hung task warnings. Adding __GFP_HIGH * grants access to memory reserves, helping to avoid this problem. */ - return kmem_cache_alloc(stable_node_cache, GFP_KERNEL | __GFP_HIGH); + struct ksm_stable_node *node =3D kmem_cache_alloc(stable_node_cache, + GFP_KERNEL | __GFP_HIGH); + + if (node) + node->partition =3D partition; + + return node; } =20 static inline void free_stable_node(struct ksm_stable_node *stable_node) @@ -777,9 +805,10 @@ static inline int get_kpfn_nid(unsigned long kpfn) } =20 static struct ksm_stable_node *alloc_stable_node_chain(struct ksm_stable_n= ode *dup, - struct rb_root *root) + struct rb_root *root, + struct partition_kobj *partition) { - struct ksm_stable_node *chain =3D alloc_stable_node(); + struct ksm_stable_node *chain =3D alloc_stable_node(partition); VM_BUG_ON(is_stable_node_chain(dup)); if (likely(chain)) { INIT_HLIST_HEAD(&chain->hlist); @@ -1016,7 +1045,8 @@ static void remove_rmap_item_from_tree(struct ksm_rma= p_item *rmap_item) unsigned char age =3D get_rmap_item_age(rmap_item); if (!age) rb_erase(&rmap_item->node, - root_unstable_tree + NUMA(rmap_item->nid)); + rmap_item->partition->root_unstable_tree + + NUMA(rmap_item->nid)); ksm_pages_unshared--; rmap_item->address &=3D PAGE_MASK; } @@ -1154,17 +1184,23 @@ static int remove_all_stable_nodes(void) struct ksm_stable_node *stable_node, *next; int nid; int err =3D 0; - - for (nid =3D 0; nid < ksm_nr_node_ids; nid++) { - while (root_stable_tree[nid].rb_node) { - stable_node =3D rb_entry(root_stable_tree[nid].rb_node, - struct ksm_stable_node, node); - if (remove_stable_node_chain(stable_node, - root_stable_tree + nid)) { - err =3D -EBUSY; - break; /* proceed to next nid */ + struct partition_kobj *partition; + struct rb_root *root_stable_tree; + + list_for_each_entry(partition, &partition_list, list) { + root_stable_tree =3D partition->root_stable_tree; + + for (nid =3D 0; nid < ksm_nr_node_ids; nid++) { + while (root_stable_tree[nid].rb_node) { + stable_node =3D rb_entry(root_stable_tree[nid].rb_node, + struct ksm_stable_node, node); + if (remove_stable_node_chain(stable_node, + root_stable_tree + nid)) { + err =3D -EBUSY; + break; /* proceed to next nid */ + } + cond_resched(); } - cond_resched(); } } list_for_each_entry_safe(stable_node, next, &migrate_nodes, list) { @@ -1802,7 +1838,8 @@ static __always_inline struct folio *chain(struct ksm= _stable_node **s_n_d, * This function returns the stable tree node of identical content if foun= d, * -EBUSY if the stable node's page is being migrated, NULL otherwise. */ -static struct folio *stable_tree_search(struct page *page) +static struct folio *stable_tree_search(struct page *page, + struct partition_kobj *partition) { int nid; struct rb_root *root; @@ -1821,7 +1858,7 @@ static struct folio *stable_tree_search(struct page *= page) } =20 nid =3D get_kpfn_nid(folio_pfn(folio)); - root =3D root_stable_tree + nid; + root =3D partition->root_stable_tree + nid; again: new =3D &root->rb_node; parent =3D NULL; @@ -1991,7 +2028,7 @@ static struct folio *stable_tree_search(struct page *= page) VM_BUG_ON(is_stable_node_dup(stable_node_dup)); /* chain is missing so create it */ stable_node =3D alloc_stable_node_chain(stable_node_dup, - root); + root, partition); if (!stable_node) return NULL; } @@ -2016,7 +2053,8 @@ static struct folio *stable_tree_search(struct page *= page) * This function returns the stable tree node just allocated on success, * NULL otherwise. */ -static struct ksm_stable_node *stable_tree_insert(struct folio *kfolio) +static struct ksm_stable_node *stable_tree_insert(struct folio *kfolio, + struct partition_kobj *partition) { int nid; unsigned long kpfn; @@ -2028,7 +2066,7 @@ static struct ksm_stable_node *stable_tree_insert(str= uct folio *kfolio) =20 kpfn =3D folio_pfn(kfolio); nid =3D get_kpfn_nid(kpfn); - root =3D root_stable_tree + nid; + root =3D partition->root_stable_tree + nid; again: parent =3D NULL; new =3D &root->rb_node; @@ -2067,7 +2105,7 @@ static struct ksm_stable_node *stable_tree_insert(str= uct folio *kfolio) } } =20 - stable_node_dup =3D alloc_stable_node(); + stable_node_dup =3D alloc_stable_node(partition); if (!stable_node_dup) return NULL; =20 @@ -2082,7 +2120,8 @@ static struct ksm_stable_node *stable_tree_insert(str= uct folio *kfolio) if (!is_stable_node_chain(stable_node)) { struct ksm_stable_node *orig =3D stable_node; /* chain is missing so create it */ - stable_node =3D alloc_stable_node_chain(orig, root); + stable_node =3D alloc_stable_node_chain(orig, root, + partition); if (!stable_node) { free_stable_node(stable_node_dup); return NULL; @@ -2121,7 +2160,7 @@ struct ksm_rmap_item *unstable_tree_search_insert(str= uct ksm_rmap_item *rmap_ite int nid; =20 nid =3D get_kpfn_nid(page_to_pfn(page)); - root =3D root_unstable_tree + nid; + root =3D rmap_item->partition->root_unstable_tree + nid; new =3D &root->rb_node; =20 while (*new) { @@ -2291,7 +2330,7 @@ static void cmp_and_merge_page(struct page *page, str= uct ksm_rmap_item *rmap_ite } =20 /* Start by searching for the folio in the stable tree */ - kfolio =3D stable_tree_search(page); + kfolio =3D stable_tree_search(page, rmap_item->partition); if (&kfolio->page =3D=3D page && rmap_item->head =3D=3D stable_node) { folio_put(kfolio); return; @@ -2344,7 +2383,8 @@ static void cmp_and_merge_page(struct page *page, str= uct ksm_rmap_item *rmap_ite * node in the stable tree and add both rmap_items. */ folio_lock(kfolio); - stable_node =3D stable_tree_insert(kfolio); + stable_node =3D stable_tree_insert(kfolio, + rmap_item->partition); if (stable_node) { stable_tree_append(tree_rmap_item, stable_node, false); @@ -2502,7 +2542,8 @@ static struct ksm_rmap_item *retrieve_rmap_item(struc= t page **page, } =20 static void ksm_sync_merge(struct mm_struct *mm, - unsigned long start, unsigned long end) + unsigned long start, unsigned long end, + struct partition_kobj *partition) { struct ksm_rmap_item *rmap_item; struct page *page; @@ -2510,6 +2551,7 @@ static void ksm_sync_merge(struct mm_struct *mm, rmap_item =3D retrieve_rmap_item(&page, mm, start, end); if (!rmap_item) return; + rmap_item->partition =3D partition; cmp_and_merge_page(page, rmap_item); put_page(page); } @@ -3328,19 +3370,23 @@ static void ksm_check_stable_tree(unsigned long sta= rt_pfn, struct ksm_stable_node *stable_node, *next; struct rb_node *node; int nid; - - for (nid =3D 0; nid < ksm_nr_node_ids; nid++) { - node =3D rb_first(root_stable_tree + nid); - while (node) { - stable_node =3D rb_entry(node, struct ksm_stable_node, node); - if (stable_node_chain_remove_range(stable_node, - start_pfn, end_pfn, - root_stable_tree + - nid)) - node =3D rb_first(root_stable_tree + nid); - else - node =3D rb_next(node); - cond_resched(); + struct rb_root *root_stable_tree + + list_for_each_entry(partition, &partition_list, list) { + root_stable_tree =3D partition->root_stable_tree; + + for (nid =3D 0; nid < ksm_nr_node_ids; nid++) { + node =3D rb_first(root_stable_tree + nid); + while (node) { + stable_node =3D rb_entry(node, struct ksm_stable_node, node); + if (stable_node_chain_remove_range(stable_node, + start_pfn, end_pfn, + root_stable_tree + nid)) + node =3D rb_first(root_stable_tree + nid); + else + node =3D rb_next(node); + cond_resched(); + } } } list_for_each_entry_safe(stable_node, next, &migrate_nodes, list) { @@ -3551,6 +3597,7 @@ static ssize_t trigger_merge_store(struct kobject *ko= bj, int ret; struct task_struct *task; struct mm_struct *mm; + struct partition_kobj *partition; =20 input =3D kstrdup(buf, GFP_KERNEL); if (!input) @@ -3583,9 +3630,13 @@ static ssize_t trigger_merge_store(struct kobject *k= obj, if (!mm) return -EINVAL; =20 + partition =3D find_partition_by_kobj(kobj); + if (!partition) + return -EINVAL; + mutex_lock(&ksm_thread_mutex); wait_while_offlining(); - ksm_sync_merge(mm, start, end); + ksm_sync_merge(mm, start, end, partition); mutex_unlock(&ksm_thread_mutex); =20 mmput(mm); @@ -3606,6 +3657,8 @@ static ssize_t merge_across_nodes_store(struct kobjec= t *kobj, { int err; unsigned long knob; + struct rb_root *root_stable_tree; + struct partition_kobj *partition; =20 err =3D kstrtoul(buf, 10, &knob); if (err) @@ -3615,6 +3668,10 @@ static ssize_t merge_across_nodes_store(struct kobje= ct *kobj, =20 mutex_lock(&ksm_thread_mutex); wait_while_offlining(); + + partition =3D find_partition_by_kobj(kobj); + root_stable_tree =3D partition->root_stable_tree; + if (ksm_merge_across_nodes !=3D knob) { if (ksm_pages_shared || remove_all_stable_nodes()) err =3D -EBUSY; @@ -3633,10 +3690,10 @@ static ssize_t merge_across_nodes_store(struct kobj= ect *kobj, if (!buf) err =3D -ENOMEM; else { - root_stable_tree =3D buf; - root_unstable_tree =3D buf + nr_node_ids; + partition->root_stable_tree =3D buf; + partition->root_unstable_tree =3D buf + nr_node_ids; /* Stable tree is empty but not the unstable */ - root_unstable_tree[0] =3D one_unstable_tree[0]; + partition->root_unstable_tree[0] =3D one_unstable_tree[0]; } } if (!err) { @@ -3834,14 +3891,6 @@ KSM_ATTR_RO(full_scans); =20 #ifdef CONFIG_SELECTIVE_KSM static struct kobject *ksm_base_kobj; - -struct partition_kobj { - struct kobject *kobj; - struct list_head list; -}; - -static LIST_HEAD(partition_list); - #else /* CONFIG_SELECTIVE_KSM */ static ssize_t smart_scan_show(struct kobject *kobj, struct kobj_attribute *attr, char *buf) @@ -4055,6 +4104,7 @@ static ssize_t add_partition_store(struct kobject *ko= bj, struct partition_kobj *new_partition_kobj; char partition_name[50]; int err; + struct rb_root *tree_root; =20 mutex_lock(&ksm_thread_mutex); =20 @@ -4081,6 +4131,13 @@ static ssize_t add_partition_store(struct kobject *k= obj, goto unlock; } =20 + tree_root =3D kcalloc(nr_node_ids + nr_node_ids, sizeof(*tree_root), GFP_= KERNEL); + if (!tree_root) { + err =3D -ENOMEM; + goto unlock; + } + new_partition_kobj->root_stable_tree =3D tree_root; + new_partition_kobj->root_unstable_tree =3D tree_root + nr_node_ids; err =3D sysfs_create_group(new_partition_kobj->kobj, &ksm_attr_group); if (err) { pr_err("ksm: register sysfs failed\n"); --=20 2.49.0.395.g12beb8f557-goog From nobody Mon Feb 9 15:10:06 2026 Received: from mail-pl1-f202.google.com (mail-pl1-f202.google.com [209.85.214.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2D45B22FE10 for ; Fri, 21 Mar 2025 17:37:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1742578663; cv=none; b=bMpIqUmZkoOitxIF7SwYd9AoRuk8on7l+tNomCwQZ69u2FF1cKXXtO2D5ljPGx9ltp2NjZD/rP7OC4nyJt1ugOxTEKmY/RZv6cp3qt1yotGG6bNNWJr6ky5LmSqXwa79wCvcdJhUwglgkayMSaAZ/SlSsvKnlRz/xC6Xd2LXeLU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1742578663; c=relaxed/simple; bh=MwpQce1IYP/xVDV0ph309+450HY6pIxryaUDUtEC12Y=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Content-Type; b=eTjbFoLbXK8YgaLILQyBBR48ih/j8pDB1dVkOgMcIfEcDPaXgZ+QcBpHdwwjDS2r6P4uEOHwWaPwtYC52c9gEAM2wcZaCDl4AoncYCj3J4/PFmxtJsCsDkviSR0KkuXkGL4RImsOFhK1i5VOhYjhRG4QODNJODRefZrB+s/2usc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--souravpanda.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=Cmihxms7; arc=none smtp.client-ip=209.85.214.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--souravpanda.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="Cmihxms7" Received: by mail-pl1-f202.google.com with SMTP id d9443c01a7336-224364f2492so32200915ad.3 for ; Fri, 21 Mar 2025 10:37:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1742578661; x=1743183461; darn=vger.kernel.org; h=to:from:subject:message-id:references:mime-version:in-reply-to:date :from:to:cc:subject:date:message-id:reply-to; bh=oVY3EzVTKOjztz/iA3gGAorv/v23BkdoPGQ1O9sMpRg=; b=Cmihxms7VFHae7SGYlvavkpX7+m5uW7WB6zwR4FG9M8/UpMmmVW47uzFf+ngCK2LXJ L5/3nxkF1RnHyLj1c4ubGfcGecs7IWHnT6oucrMq3nmfn8azNZKMYMOdt9ewre0fyeEm xsiC2KNkrz+1BZ/xS3WyvwXPGEoqrdKQTlsRIm91ry+m6Jvz8TObwQsrBSGWKKAphQc+ A+1ZvV27EMXOhuNKbx0an6txFeq/TKWXzDH3bL8vffRLQMp5mSBJATEFktju/PkQy20T b+MbQr2PUotHehP8ynvHTDCQhr9FMCbwdi4fyQ4tP25L6qS3li74IJ1cT44Rvl4wnXLq 1YyQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1742578661; x=1743183461; h=to:from:subject:message-id:references:mime-version:in-reply-to:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=oVY3EzVTKOjztz/iA3gGAorv/v23BkdoPGQ1O9sMpRg=; b=WnkMkojOprsRG/+pAp0Y2B7efxp3rdC147QXqU4ABYg54kwhxTzA4VGDq8WHAWj2J7 bASCLWCaq1xJKlkSuRRKzdzbdEcoS7IN0r0b3ZCso4Ml+6OP9uwPtJaZlvmgRFnFR1Lv ZZue/19gqB5qq38M/9uEmX8aQkx6wstdL0FpfQjFWyGe/QJmE2Joq/vncEVmnTtuXIBi E/q/E819AmKO/fw2YlacwMim2/98GUYMzDDZNcKiI17oGBcc0bwgBa81p4tSKEs7jTjx 1iPm7geLVtoLmXEqZDUG4jc14EDggUUs8KityulpJpBe6/qGgeuNsT5Fwx7LtvP/tQ0c jzCw== X-Forwarded-Encrypted: i=1; AJvYcCUBs8yAkx/sQHd7HH2KuCZfOkscwSeK9iaU2ugDfvYUHu3XluWpH9g7KgEzbb/FV7YPbnULVGiq1LOKgn8=@vger.kernel.org X-Gm-Message-State: AOJu0YxgxvxnLL6JRV0zwgvxwKK0RCwJTOTlQRcgUVQ261BHMmbfiwuW cwYtKYthzULiU7TQ4ItxE2S085bKvdAH02EO/pVSA2o0+DRswssseEeWJ629u8xxHVJeaw3tDP0 8DedHMxRplLsbcesQYbBvcA== X-Google-Smtp-Source: AGHT+IF+cvqsQFV2a6Y9/WZ27WhrC22Tdz6Ik3urAFxxcSKORKZ17tAg0m4s+cXw6DmZenA0G9mdHnaddcz52VU3Tw== X-Received: from pfbdo13.prod.google.com ([2002:a05:6a00:4a0d:b0:736:a055:1ce3]) (user=souravpanda job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a00:218b:b0:736:7a00:e522 with SMTP id d2e1a72fcca58-7390593b7f2mr6721955b3a.2.1742578661516; Fri, 21 Mar 2025 10:37:41 -0700 (PDT) Date: Fri, 21 Mar 2025 17:37:28 +0000 In-Reply-To: <20250321173729.3175898-1-souravpanda@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250321173729.3175898-1-souravpanda@google.com> X-Mailer: git-send-email 2.49.0.395.g12beb8f557-goog Message-ID: <20250321173729.3175898-6-souravpanda@google.com> Subject: [RFC PATCH 5/6] mm: trigger unmerge and remove SELECTIVE KSM partition From: Sourav Panda To: mathieu.desnoyers@efficios.com, willy@infradead.org, david@redhat.com, pasha.tatashin@soleen.com, rientjes@google.com, akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, weixugc@google.com, gthelen@google.com, souravpanda@google.com, surenb@google.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Trigger unmerge or remove a partition using the following sysfs interface: Triggering an unmerge for a specific partition: echo "pid" > /sys/kernel/mm/ksm/partition_name/trigger_unmerge Removing a partition: echo "partition_to_remove" > /sys/kernel/mm/ksm/control/remove_partition Limitation of current implementation: On carrying out trigger_unmerge, we unmerge all rmap items which is wrong. We should only unmerge the rmap items that belong to the partition where we called unmerge. Another limitation is that we do not specify the address range when echoing into trigger unmerge. Intentionally left out till until we determine the implementation feasibility. Signed-off-by: Sourav Panda --- mm/ksm.c | 120 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 120 insertions(+) diff --git a/mm/ksm.c b/mm/ksm.c index b575250aaf45..fd7626d5d8c9 100644 --- a/mm/ksm.c +++ b/mm/ksm.c @@ -2556,6 +2556,31 @@ static void ksm_sync_merge(struct mm_struct *mm, put_page(page); } =20 +static void ksm_sync_unmerge(struct mm_struct *mm) +{ + struct mm_slot *slot; + struct ksm_mm_slot *mm_slot; + + struct vm_area_struct *vma; + struct vma_iterator vmi; + + slot =3D mm_slot_lookup(mm_slots_hash, mm); + mm_slot =3D container_of(slot, struct ksm_mm_slot, slot); + + ksm_scan.address =3D 0; + vma_iter_init(&vmi, mm, ksm_scan.address); + + mmap_read_lock(mm); + for_each_vma(vmi, vma) { + if (!(vma->vm_flags & VM_MERGEABLE) || !vma->anon_vma) + continue; + unmerge_ksm_pages(vma, vma->vm_start, vma->vm_end, false); + } + remove_trailing_rmap_items(&mm_slot->rmap_list); + + mmap_read_unlock(mm); +} + #else /* CONFIG_SELECTIVE_KSM */ /* * Calculate skip age for the ksm page age. The age determines how often @@ -3644,6 +3669,58 @@ static ssize_t trigger_merge_store(struct kobject *k= obj, } KSM_ATTR(trigger_merge); =20 +static ssize_t trigger_unmerge_show(struct kobject *kobj, + struct kobj_attribute *attr, + char *buf) +{ + return -EINVAL; /* Not yet implemented */ +} + +static ssize_t trigger_unmerge_store(struct kobject *kobj, + struct kobj_attribute *attr, + const char *buf, size_t count) +{ + pid_t pid; + char *input, *ptr; + int ret; + struct task_struct *task; + struct mm_struct *mm; + + input =3D kstrdup(buf, GFP_KERNEL); + if (!input) + return -ENOMEM; + + ptr =3D strim(input); + ret =3D kstrtoint(ptr, 10, &pid); + kfree(input); + + /* Find the mm_struct */ + rcu_read_lock(); + task =3D find_task_by_vpid(pid); + if (!task) { + rcu_read_unlock(); + return -ESRCH; + } + + get_task_struct(task); + + rcu_read_unlock(); + mm =3D get_task_mm(task); + put_task_struct(task); + + if (!mm) + return -EINVAL; + + mutex_lock(&ksm_thread_mutex); + wait_while_offlining(); + ksm_sync_unmerge(mm); + mutex_unlock(&ksm_thread_mutex); + + mmput(mm); + return count; +} +KSM_ATTR(trigger_unmerge); + #ifdef CONFIG_NUMA static ssize_t merge_across_nodes_show(struct kobject *kobj, struct kobj_attribute *attr, char *buf) @@ -4044,6 +4121,7 @@ static struct attribute *ksm_attrs[] =3D { &pages_to_scan_attr.attr, &run_attr.attr, &trigger_merge_attr.attr, + &trigger_unmerge_attr.attr, &pages_scanned_attr.attr, &pages_shared_attr.attr, &pages_sharing_attr.attr, @@ -4156,9 +4234,51 @@ static ssize_t add_partition_store(struct kobject *k= obj, static struct kobj_attribute add_kobj_attr =3D __ATTR(add_partition, 0220,= NULL, add_partition_store); =20 +static ssize_t remove_partition_store(struct kobject *kobj, + struct kobj_attribute *attr, + const char *buf, size_t count) +{ + struct partition_kobj *partition; + struct partition_kobj *partition_found =3D NULL; + char partition_name[50]; + int err =3D 0; + + if (sscanf(buf, "%31s", partition_name) !=3D 1) + return -EINVAL; + + mutex_lock(&ksm_thread_mutex); + + list_for_each_entry(partition, &partition_list, list) { + if (strcmp(kobject_name(partition->kobj), partition_name) =3D=3D 0) { + partition_found =3D partition; + break; + } + } + + if (!partition_found) { + err =3D -ENOENT; + goto unlock; + } + + unmerge_and_remove_all_rmap_items(); + + kobject_put(partition_found->kobj); + list_del(&partition_found->list); + kfree(partition_found->root_stable_tree); + kfree(partition_found); + +unlock: + mutex_unlock(&ksm_thread_mutex); + return err ? err : count; +} + +static struct kobj_attribute rm_kobj_attr =3D __ATTR(remove_partition, 022= 0, NULL, + remove_partition_store); + /* Array of attributes for base kobject */ static struct attribute *ksm_base_attrs[] =3D { &add_kobj_attr.attr, + &rm_kobj_attr.attr, NULL, /* NULL-terminated */ }; =20 --=20 2.49.0.395.g12beb8f557-goog From nobody Mon Feb 9 15:10:06 2026 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 915BA22FF55 for ; Fri, 21 Mar 2025 17:37:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1742578665; cv=none; b=oiM9mSwDwZpEkhM11Yh58BrGnXpo11lAvD74eoQR/hfGAReQB0E9BRW1bI/23e0POKSoGh4Szr02UjZwdJxg6sksncBgy8iKoJZ7ntRUmgyIkK9InMbmzmhL8mhngSdFM1O1SiGQZn2koQuid+SAkbdCuGYaj/7VgECzSeo4Odw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1742578665; c=relaxed/simple; bh=IV5KDKRwIWWBaHx1XZgeNA+fBEOe7w8Yoh7U/uZjyGs=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Content-Type; b=A4XMVBoXpSn81zn0jlYw+f9Lz3ABP+TctUFier458bmIIbvQruCtXuGMzK0qGETRwHtLmznHFhI12qfoOfyICdCGLiyOdNuWDXEFm3b8KjwkIGnQldG+Ivr9TSHv6QxgLx9WfVQ/X0CZ3/2frHcYPSFdf9tuhybosYxDA+O+f8k= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--souravpanda.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=KbQMQx5E; arc=none smtp.client-ip=209.85.216.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--souravpanda.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="KbQMQx5E" Received: by mail-pj1-f74.google.com with SMTP id 98e67ed59e1d1-2ff6aaa18e8so3394056a91.1 for ; Fri, 21 Mar 2025 10:37:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1742578663; x=1743183463; darn=vger.kernel.org; h=to:from:subject:message-id:references:mime-version:in-reply-to:date :from:to:cc:subject:date:message-id:reply-to; bh=vGIlFeRwp00cBGERgxxLQ4RbBbIFX2vI8pKUuAePCaE=; b=KbQMQx5E4HCCI1O7oiTYw83/0AzF7pwXebfVq/i+dLfdtZm1bTNfcbeTtNQMMQ+Zbf nurfb9JSYjGAYkkfqpqFxX5Upgm+lzAmM+yPfF/iJIhddZnyz9L6ZH+aJuAZLKlFx+KW LIe3wWCp2P3x5AhEcvls4nKez3UQy9PaxIdl943204TtIVPL8/ErUTqgMLTsxe+LY09Q QXMyrVrBFP//OLNXrXNwVzL/T70cSmPIk19q2yXjTNrR0/xAxtRZa359i/HGo6teqpn/ 5RMlP4RztXeTHr+m8jnDvVPVcpdKamDo7xAbweubzZZ+F/DbZElvr60NddP4oQofBsl2 zTlg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1742578663; x=1743183463; h=to:from:subject:message-id:references:mime-version:in-reply-to:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=vGIlFeRwp00cBGERgxxLQ4RbBbIFX2vI8pKUuAePCaE=; b=WCpvWrypRJJWn3a6M0z3ijkbGKCUBgvzeyFRm3OmICL55+skaXNy0NOKbvxa6H/rK4 /mJVRBv83sZuKDKmhkzjWQoou7FnoyTLZAoIoxP3zR78cooCTDZr6nVb1y5yjZDyRAGH 0aInQMBOHofjfhilW+9BJa47aNQ+6GHFCvCDxdqX2qWgSMO0/JPwmcD4cOjDeKBtGgc0 e5Y301ndjJA/KK3RavLXk5kHkcn3ydEF6vDjm5cncKacu0dyue2B+Ui6C+25CYzbAM87 JHXW9yMHrBGIfhj1uBajl2KSpB3H+Va7LQLm0AXXH018zj+Fc3EhA8S8C+IKaJoyeziT 8v9Q== X-Forwarded-Encrypted: i=1; AJvYcCVaZCGTPDzc3vSe2chSgShMdBjvOJM/hOrNj4YlbMmFq9vZHOSwnVqnwCoPRkmdCXEGh/+INAOv4XLwkpc=@vger.kernel.org X-Gm-Message-State: AOJu0YwIGohQWzIJiXC8Mr+TaiMt7FV1N2NkawcMGoM8Q6zmAtl+SRQm HkeezUIL9ZY97GH0zw7eBniRARrTkhezT6bFHsGQ+Dx0001iNhAtoy5deA1q1CxMyPPh8zVwUPc Estd/iPWhTrucbJvtCK84/Q== X-Google-Smtp-Source: AGHT+IEi+XabJeuTZlJOMXJIuxzHgrE4b595XPIHNxfB9i6zloVLdGL4AH5NwI1Pnt5e4UJSRIwRL82NwESkLDRViQ== X-Received: from pjg4.prod.google.com ([2002:a17:90b:3f44:b0:301:2679:9aa]) (user=souravpanda job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:280b:b0:2ee:d824:b559 with SMTP id 98e67ed59e1d1-3030fef09b9mr6204491a91.28.1742578662931; Fri, 21 Mar 2025 10:37:42 -0700 (PDT) Date: Fri, 21 Mar 2025 17:37:29 +0000 In-Reply-To: <20250321173729.3175898-1-souravpanda@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250321173729.3175898-1-souravpanda@google.com> X-Mailer: git-send-email 2.49.0.395.g12beb8f557-goog Message-ID: <20250321173729.3175898-7-souravpanda@google.com> Subject: [RFC PATCH 6/6] mm: syscall alternative for SELECTIVE_KSM From: Sourav Panda To: mathieu.desnoyers@efficios.com, willy@infradead.org, david@redhat.com, pasha.tatashin@soleen.com, rientjes@google.com, akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, weixugc@google.com, gthelen@google.com, souravpanda@google.com, surenb@google.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Partition can be created or opened using: int ksm_fd =3D ksm_open(ksm_name, flag); name specifies the ksm partition to be created or opened. flags: O_CREAT Create the ksm partition object if it does not exist. O_EXCL If O_CREAT was also specified, and a ksm partition object with the given name already exists, return an error. Trigger the merge using: ksm_merge(ksm_fd, pid, start_addr, size); Limitation: Only supporting x86 syscall_64. Signed-off-by: Sourav Panda --- arch/x86/entry/syscalls/syscall_64.tbl | 3 +- include/linux/ksm.h | 4 + mm/ksm.c | 156 ++++++++++++++++++++++++- 3 files changed, 161 insertions(+), 2 deletions(-) diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscal= ls/syscall_64.tbl index 5eb708bff1c7..352d747dbe33 100644 --- a/arch/x86/entry/syscalls/syscall_64.tbl +++ b/arch/x86/entry/syscalls/syscall_64.tbl @@ -390,7 +390,8 @@ 464 common getxattrat sys_getxattrat 465 common listxattrat sys_listxattrat 466 common removexattrat sys_removexattrat - +467 common ksm_open sys_ksm_open +468 common ksm_merge sys_ksm_merge # # Due to a historical design error, certain syscalls are numbered differen= tly # in x32 as compared to native x86_64. These syscalls have numbers 512-54= 7. diff --git a/include/linux/ksm.h b/include/linux/ksm.h index d73095b5cd96..a94c89403c29 100644 --- a/include/linux/ksm.h +++ b/include/linux/ksm.h @@ -14,6 +14,10 @@ #include #include =20 +#include +#include +#define MAX_KSM_NAME_LEN 128 + #ifdef CONFIG_KSM int ksm_madvise(struct vm_area_struct *vma, unsigned long start, unsigned long end, int advice, unsigned long *vm_flags); diff --git a/mm/ksm.c b/mm/ksm.c index fd7626d5d8c9..71558120b034 100644 --- a/mm/ksm.c +++ b/mm/ksm.c @@ -147,7 +147,8 @@ struct ksm_scan { static struct kobject *ksm_base_kobj; =20 struct partition_kobj { - struct kobject *kobj; + struct kobject *kobj; /* Not required for the syscall interface */ + char name[MAX_KSM_NAME_LEN]; struct list_head list; struct rb_root *root_stable_tree; struct rb_root *root_unstable_tree; @@ -166,6 +167,106 @@ static struct partition_kobj *find_partition_by_kobj(= struct kobject *kobj) return NULL; } =20 +static struct partition_kobj *find_ksm_partition(char *partition_name) +{ + struct partition_kobj *partition; + + list_for_each_entry(partition, &partition_list, list) { + if (strcmp(partition->name, partition_name) =3D=3D 0) + return partition; + } + return NULL; +} + +static DEFINE_MUTEX(ksm_partition_lock); + +static int ksm_release(struct inode *inode, struct file *file) +{ + struct partition_kobj *ksm =3D file->private_data; + + mutex_lock(&ksm_partition_lock); + list_del(&ksm->list); + mutex_unlock(&ksm_partition_lock); + + kfree(ksm); + return 0; +} + +static const struct file_operations ksm_fops =3D { + .release =3D ksm_release, +}; + +static struct partition_kobj *ksm_create_partition(char *ksm_name) +{ + struct partition_kobj *partition; + struct rb_root *tree_root; + + partition =3D kzalloc(sizeof(*partition), GFP_KERNEL); + if (!partition) + return NULL; + + tree_root =3D kcalloc(nr_node_ids + nr_node_ids, sizeof(*tree_root), + GFP_KERNEL); + if (!tree_root) + return NULL; + + partition->root_stable_tree =3D tree_root; + partition->root_unstable_tree =3D tree_root + nr_node_ids; + strncpy(partition->name, ksm_name, sizeof(partition->name)); + + list_add(&partition->list, &partition_list); + + return partition; +} + +static int ksm_partition_fd(struct partition_kobj *partition) +{ + int fd; + struct file *file; + int ret; + + file =3D anon_inode_getfile("ksm_partition", &ksm_fops, partition, O_RDWR= ); + if (IS_ERR(file)) { + ret =3D PTR_ERR(file); + return ret; + } + + fd =3D get_unused_fd_flags(O_RDWR); + if (fd < 0) { + fput(file); + return fd; + } + fd_install(fd, file); + return fd; +} + +SYSCALL_DEFINE2(ksm_open, const char __user *, ksm_name, int, flags) { + char name[MAX_KSM_NAME_LEN]; + struct partition_kobj *partition; + int ret; + + ret =3D strncpy_from_user(name, ksm_name, sizeof(name)); + if (ret < 0) + return -EFAULT; + + partition =3D find_ksm_partition(name); + + if (flags & O_EXCL && partition) /* Partition already exists, return erro= r */ + return -EEXIST; + + if (flags & O_CREAT && !partition) { + /* Partition does not exist, but we are allowed to create one */ + mutex_lock(&ksm_partition_lock); + partition =3D ksm_create_partition(name); + mutex_unlock(&ksm_partition_lock); + } + + if (!partition) + return flags & O_CREAT ? -ENOMEM : -ENOENT; + + return ksm_partition_fd(partition); +} + /** * struct ksm_stable_node - node of the stable rbtree * @node: rb node of this ksm page in the stable tree @@ -4324,6 +4425,59 @@ static int __init ksm_thread_sysfs_init(void) } #endif /* CONFIG_SELECTIVE_KSM */ =20 +SYSCALL_DEFINE4(ksm_merge, int, ksm_fd, pid_t, pid, unsigned long, start, = size_t, size) { + unsigned long end =3D start + size; + struct task_struct *task; + struct mm_struct *mm; + struct partition_kobj *partition; + struct file *file; + + file =3D fget(ksm_fd); + if (!file) + return -EBADF; + + partition =3D file->private_data; + if (!partition) { + fput(file); + return -EINVAL; + } + + if (start >=3D end) { + fput(file); + return -EINVAL; + } + + /* Find the mm_struct */ + rcu_read_lock(); + task =3D find_task_by_vpid(pid); + if (!task) { + fput(file); + rcu_read_unlock(); + return -ESRCH; + } + + get_task_struct(task); + + rcu_read_unlock(); + mm =3D get_task_mm(task); + put_task_struct(task); + + if (!mm) { + fput(file); + return -EINVAL; + } + + mutex_lock(&ksm_thread_mutex); + wait_while_offlining(); + ksm_sync_merge(mm, start, end, partition); + mutex_unlock(&ksm_thread_mutex); + + mmput(mm); + + fput(file); + return 0; +} + static int __init ksm_init(void) { int err; --=20 2.49.0.395.g12beb8f557-goog