From nobody Thu Jan 30 18:52:51 2025 Received: from mail-pl1-f179.google.com (mail-pl1-f179.google.com [209.85.214.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 976B118EA2 for ; Mon, 27 Jan 2025 08:03:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.179 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737964993; cv=none; b=d5SclviVpo39BwyZT7SU0Yw5B1XdiMfkWeWtyum8bIQ2XBZczNZmfm9LMqc8hhgbBS5HUKU44Sf0k9Pd7CAv2XbXTzl+hQebiu4mqvKxf8bvDMByVC7+0797K4NGBBaR9Ol42A+jTfd8f2l8aTAJt9QCY6sZza2261golTvFbGA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737964993; c=relaxed/simple; bh=QUjQOC+hjtTurq+rcuMEFoSwVg9zm/Zkjv0XpDY00c8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=N+5pp7P693jkt3PZRot76VgbHjJ8bi+ZcjFLZ3sAIL81YUsFY7WbgJNLE4WySRd0PTHao77bCdibuCAbs5EgwQGOTgKhaniwDgFhq8rVkCh1/R6BoAlpHIio2rgewJJpZDkDDdTFfqrSWoWcUOiE4umpChKn6qZocJxV5c8/DiI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=chromium.org; spf=pass smtp.mailfrom=chromium.org; dkim=pass (1024-bit key) header.d=chromium.org header.i=@chromium.org header.b=oe3lhESb; arc=none smtp.client-ip=209.85.214.179 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=chromium.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=chromium.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=chromium.org header.i=@chromium.org header.b="oe3lhESb" Received: by mail-pl1-f179.google.com with SMTP id d9443c01a7336-21a7ed0155cso65853575ad.3 for ; Mon, 27 Jan 2025 00:03:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1737964991; x=1738569791; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=9BKLEcG6n3DfmSf8Cjanopzri1qniom6oLk870hy4qs=; b=oe3lhESb5T8Ehfkg0C/0KQWBuOLi7im59CqXz1kPuei2bpwC07EH7uvwa3o4jrhq8r bTCYjWoS4NMlI7SWpZ0NurQ72EPnfPm8ntfywDk39iD+O5Tkk+1+2xjhCegj1zS0BrJ0 cM/reYhzUdnyS+5oLsfvWqR1YFL3H8AGbURYE= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737964991; x=1738569791; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=9BKLEcG6n3DfmSf8Cjanopzri1qniom6oLk870hy4qs=; b=gQZnO6bQuVf40bRIhfdzt5lC/f4u74XfB4zqRK/mD4OCDbmjJYxpaNEJ0vkXjTNSeq vSX240V585pHiytVSC0b7uYbuSAumIHH0y3e6MDhPZj5QyiVT2NfSjeDxagRhnFx4j46 IeMUH22gQE2tymGz2Unz8OcFFtGkaILK3A8SfU2WH+bfnmXfXU3rHY9V9YcPQnmCu4Hz 6EVpG3S9tkc6GOewsNTdr8/eGx+BwwBa8ZxOsZNXULek0njs0SYBR7zdqIHCHRl3JxwX w5R62Al5kywFIlzmNFVx7VBwFYGey/mEXs2W13IWMc+UJdrtMeM/pRJsiW01AqzmwQUz NG/g== X-Forwarded-Encrypted: i=1; AJvYcCXnrKv9CSa0e/lcnJpMTVXqMDWQ4XdyiKL5lGH/FEDCcIWRzJugZzEfevabLQzItE6FkEKaM4n0U6yhzM0=@vger.kernel.org X-Gm-Message-State: AOJu0YwpULKE/kwTUZUCCukewS1z956amRICjmLFrEXf7ub1pSHY1pY6 G6VFGnsm815tDlrX8BQhyrVNxnxdzpCE23lQ3Pp4lezV/YG7hVfd3h4c6/CAuA== X-Gm-Gg: ASbGnct76xRu1yMrLk6l00dWXHVDGsDW81NqzqMIizd+7Oe8EVx3IVHdn2Ur8yvjQ/1 4RnIHVYyfcP7v1VMhfqWUbDyk5BPZkPN8gl2yKZI6zz82b44976X5Un0DwuvnXFwqqUmxlhmT2k 2PjRDTAAw7u4KA7I4cYb5PHVsIrLKOnVfYdVpiDcpDeHmfLd/2XD7m86zUhKJdHCSroyqJ6zTWx +AeNireSoF7js+MnBKy2n/vs5l9CUNV/v8UU5/YIx3R0BwcvcVDvm0rUHELARM6B1AfFcqpTG2T 7XC3J/o= X-Google-Smtp-Source: AGHT+IGkzFE/72UtLpMeB1N62Pwf7Zxax/xK4f/D9pilE4rZeetABBV2aUoAWwiZJw+ncza8QLiWeg== X-Received: by 2002:a17:903:124d:b0:21a:8dec:e57a with SMTP id d9443c01a7336-21c35619607mr540586485ad.48.1737964990824; Mon, 27 Jan 2025 00:03:10 -0800 (PST) Received: from localhost ([2401:fa00:8f:203:566d:6152:c049:8d3a]) by smtp.gmail.com with UTF8SMTPSA id d9443c01a7336-21da3d9c9f2sm57195075ad.42.2025.01.27.00.03.08 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 27 Jan 2025 00:03:10 -0800 (PST) From: Sergey Senozhatsky To: Andrew Morton , Minchan Kim , Johannes Weiner , Yosry Ahmed , Nhat Pham Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Sergey Senozhatsky Subject: [RFC PATCH 1/6] zram: deffer slot free notification Date: Mon, 27 Jan 2025 16:59:26 +0900 Message-ID: <20250127080254.1302026-2-senozhatsky@chromium.org> X-Mailer: git-send-email 2.48.1.262.g85cc9f2d1e-goog In-Reply-To: <20250127080254.1302026-1-senozhatsky@chromium.org> References: <20250127080254.1302026-1-senozhatsky@chromium.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" As of this moment ->swap_slot_free_notify is called from atomic section (under spin-lock) which makes it impossible to make zsmalloc fully preemptible. Deffer slot-free to a non-atomic context. Signed-off-by: Sergey Senozhatsky --- drivers/block/zram/zram_drv.c | 66 +++++++++++++++++++++++++++++++++-- drivers/block/zram/zram_drv.h | 4 +++ 2 files changed, 68 insertions(+), 2 deletions(-) diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c index ad3e8885b0d2..9c72beb86ab0 100644 --- a/drivers/block/zram/zram_drv.c +++ b/drivers/block/zram/zram_drv.c @@ -2315,9 +2315,52 @@ static void zram_submit_bio(struct bio *bio) } } =20 +static void async_slot_free(struct work_struct *work) +{ + struct zram *zram =3D container_of(work, struct zram, slot_free_work); + + spin_lock(&zram->slot_free_lock); + while (!list_empty(&zram->slot_free_list)) { + struct zram_pp_slot *pps; + + pps =3D list_first_entry(&zram->slot_free_list, + struct zram_pp_slot, + entry); + list_del_init(&pps->entry); + spin_unlock(&zram->slot_free_lock); + + zram_slot_write_lock(zram, pps->index); + if (zram_test_flag(zram, pps->index, ZRAM_PP_SLOT)) + zram_free_page(zram, pps->index); + zram_slot_write_unlock(zram, pps->index); + + kfree(pps); + spin_lock(&zram->slot_free_lock); + } + spin_unlock(&zram->slot_free_lock); +}; + +static void zram_kick_slot_free(struct zram *zram) +{ + schedule_work(&zram->slot_free_work); +} + +static void zram_flush_slot_free(struct zram *zram) +{ + flush_work(&zram->slot_free_work); +} + +static void zram_init_slot_free(struct zram *zram) +{ + spin_lock_init(&zram->slot_free_lock); + INIT_LIST_HEAD(&zram->slot_free_list); + INIT_WORK(&zram->slot_free_work, async_slot_free); +} + static void zram_slot_free_notify(struct block_device *bdev, - unsigned long index) + unsigned long index) { + struct zram_pp_slot *pps; struct zram *zram; =20 zram =3D bdev->bd_disk->private_data; @@ -2328,7 +2371,24 @@ static void zram_slot_free_notify(struct block_devic= e *bdev, return; } =20 - zram_free_page(zram, index); + if (zram_test_flag(zram, index, ZRAM_PP_SLOT)) + goto out; + + pps =3D kzalloc(sizeof(*pps), GFP_ATOMIC); + if (!pps) { + atomic64_inc(&zram->stats.miss_free); + goto out; + } + + INIT_LIST_HEAD(&pps->entry); + pps->index =3D index; + zram_set_flag(zram, index, ZRAM_PP_SLOT); + spin_lock(&zram->slot_free_lock); + list_add(&pps->entry, &zram->slot_free_list); + spin_unlock(&zram->slot_free_lock); + + zram_kick_slot_free(zram); +out: zram_slot_write_unlock(zram, index); } =20 @@ -2473,6 +2533,7 @@ static ssize_t reset_store(struct device *dev, =20 /* Make sure all the pending I/O are finished */ sync_blockdev(disk->part0); + zram_flush_slot_free(zram); zram_reset_device(zram); =20 mutex_lock(&disk->open_mutex); @@ -2618,6 +2679,7 @@ static int zram_add(void) atomic_set(&zram->pp_in_progress, 0); zram_comp_params_reset(zram); comp_algorithm_set(zram, ZRAM_PRIMARY_COMP, default_compressor); + zram_init_slot_free(zram); =20 /* Actual capacity set using sysfs (/sys/block/zram/disksize */ set_capacity(zram->disk, 0); diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h index b7e250d6fa02..27ca269f4a4e 100644 --- a/drivers/block/zram/zram_drv.h +++ b/drivers/block/zram/zram_drv.h @@ -134,5 +134,9 @@ struct zram { struct dentry *debugfs_dir; #endif atomic_t pp_in_progress; + + spinlock_t slot_free_lock; + struct list_head slot_free_list; + struct work_struct slot_free_work; }; #endif --=20 2.48.1.262.g85cc9f2d1e-goog From nobody Thu Jan 30 18:52:51 2025 Received: from mail-pl1-f179.google.com (mail-pl1-f179.google.com [209.85.214.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 415541898F8 for ; Mon, 27 Jan 2025 08:03:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.179 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737964999; cv=none; b=K2BwMM7wtJcteQBao+PL4xhT6gvH1xSawSmDpaiYgqEIlCRkX8tnBtj2dkg4ukHNsDP8wHvcWQbUiMtMZmKUUnT4aEX4HFsZN8HtVskcqduZYcTXwGyH4vhl6bR1wcskAQCGa/Cdy9saLG72zgn2MFzvHi4r95+Oj41kqdqFY18= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737964999; c=relaxed/simple; bh=B1NzgBatMQTepBhsWwGHQ6pZwDR4iTkCzSIdGWZ3RZA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ijFIPHQv4SPIHQLaFaqzox0so8+5DLK+Kv6IxBRyCRyLysR/+pOmFh5NXreIZeOTd99gYN9uIE7B+8+7QN7xlbEEE/QPwglcV7VQ8hkeSN/Ip1VBiHSNhNXZzpW3l6g59RIBNHvxd4bl7VLr7kdN0qqW7bHS6Ffa2EDRMyWgFMQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=chromium.org; spf=pass smtp.mailfrom=chromium.org; dkim=pass (1024-bit key) header.d=chromium.org header.i=@chromium.org header.b=DfbTcLZ7; arc=none smtp.client-ip=209.85.214.179 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=chromium.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=chromium.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=chromium.org header.i=@chromium.org header.b="DfbTcLZ7" Received: by mail-pl1-f179.google.com with SMTP id d9443c01a7336-21a7ed0155cso65855315ad.3 for ; Mon, 27 Jan 2025 00:03:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1737964996; x=1738569796; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=sJOiK/AZ7SDP8ZEH670moHkVzGkxeiUJkhPpwvQZYoo=; b=DfbTcLZ7CVYL6RqwrbP6KbHQ/uEdDMCjI1pwD4moI/pxeEG0u9OLILLYC6Lq73XCUv WgYNcHGc0+ZsyfnIrnZVlOMwDKUlsI+w9isVjS6jxdh1MKwH6Rr9h/0sZtxZLwZn1OYI A2PirKh2ju6F34QdHrn6LkV4q/Ucxj8R3bLUo= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737964996; x=1738569796; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=sJOiK/AZ7SDP8ZEH670moHkVzGkxeiUJkhPpwvQZYoo=; b=ZzU6znDVO+5jG2VjYSMd8nk9QVMRH6Xmi2XX1kElINIvw5EwKFdS3P4wAZkipWjDHF 2+CqNAKGgsdmb3LME0+8SCnT5i4Cg3nNYMJJzrJHZ3HOandx8D1HofocQLuBWo417Cjz iZzI1QroQR+TPvgn8lXDZDbUq3fqxDt9Rr9KbyHm5D/p1eDquBiRC85yturzO2eZSDZ2 qWmg+AZcSA5c53kgdImXAMOVw6Y/pjL+36FjhD3LNWo0inbJF6KklLF1YXistHDBI5I9 oqCHl3AM3tVngVhQljW+o/VkRJbNJxQ2CgXVUbhPemwOPSe5Yd8ccqOdsbekin5zWS+8 g8Sw== X-Forwarded-Encrypted: i=1; AJvYcCWeWT+cO4Tdf0d1XbdOZvwFstv8S/KjVmFCndYxslmXpPFLsXifkK/XVK3iv4iRlNdt5TCRVAsfVPMaObE=@vger.kernel.org X-Gm-Message-State: AOJu0YzSAKCWm7NO0y2A5dJ5Pf6aKg5XzeBUn1oBo0I3bSfrMlSnUEhz I6uj+QXorHXebXCn0/2Rtq5p5zl59b4oUfyVLP+BD3kOMnGHSl6jpOsK8maJKQ== X-Gm-Gg: ASbGnctbQ53t9hPtPZSoqPESg60HrcbaGvjsTi/9OsDOc4V/yt5bLwAyVHwX9c2X08l j5RGZgs6kWnG6dUdV8qEiD15lVT2wGkzbV6bpYYJeG52As+YRCb//CQTn1F9E9ihVHXo+9Wt7dP SWAC/P0tzoL1PHMd5bgILGn1Kww5AiD37188kEI9cr9mRDUA/QJvkIEETxYYaAMymXoSqXQrb+g V4j/2ZbitaeBKZOOJn7PQ+SDBo5zkUd1EaXNcPhlUwjUbXt4KYYrTiR0W9/KC9ai9KqqtVwpKSv yUC12O4= X-Google-Smtp-Source: AGHT+IHumB/yavt3u5eoYPAQOgJWZc93KAkbBNAvMwvcgxL7JyG5xBcpCGjcvKI+xBBcHvQIx6IveQ== X-Received: by 2002:a17:902:d2c8:b0:216:5e6e:68ae with SMTP id d9443c01a7336-21c3558c76bmr601303625ad.31.1737964996493; Mon, 27 Jan 2025 00:03:16 -0800 (PST) Received: from localhost ([2401:fa00:8f:203:566d:6152:c049:8d3a]) by smtp.gmail.com with UTF8SMTPSA id d9443c01a7336-21da3d9dd80sm57265565ad.33.2025.01.27.00.03.14 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 27 Jan 2025 00:03:16 -0800 (PST) From: Sergey Senozhatsky To: Andrew Morton , Minchan Kim , Johannes Weiner , Yosry Ahmed , Nhat Pham Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Sergey Senozhatsky Subject: [RFC PATCH 2/6] zsmalloc: make zspage lock preemptible Date: Mon, 27 Jan 2025 16:59:27 +0900 Message-ID: <20250127080254.1302026-3-senozhatsky@chromium.org> X-Mailer: git-send-email 2.48.1.262.g85cc9f2d1e-goog In-Reply-To: <20250127080254.1302026-1-senozhatsky@chromium.org> References: <20250127080254.1302026-1-senozhatsky@chromium.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Switch over from rwlock_t to a atomic_t variable that takes negative value when the page is under migration, or positive values when the page is used by zsmalloc users (object map, etc.) Using a rwsem per-zspage is a little too memory heavy, a simple atomic_t should suffice, after all we only need to mark zspage as either used-for-write or used-for-read. This is needed to make zsmalloc preemtible in the future. Signed-off-by: Sergey Senozhatsky --- mm/zsmalloc.c | 112 +++++++++++++++++++++++++++++--------------------- 1 file changed, 66 insertions(+), 46 deletions(-) diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c index 817626a351f8..28a75bfbeaa6 100644 --- a/mm/zsmalloc.c +++ b/mm/zsmalloc.c @@ -257,6 +257,9 @@ static inline void free_zpdesc(struct zpdesc *zpdesc) __free_page(page); } =20 +#define ZS_PAGE_UNLOCKED 0 +#define ZS_PAGE_WRLOCKED -1 + struct zspage { struct { unsigned int huge:HUGE_BITS; @@ -269,7 +272,7 @@ struct zspage { struct zpdesc *first_zpdesc; struct list_head list; /* fullness list */ struct zs_pool *pool; - rwlock_t lock; + atomic_t lock; }; =20 struct mapping_area { @@ -290,11 +293,53 @@ static bool ZsHugePage(struct zspage *zspage) return zspage->huge; } =20 -static void migrate_lock_init(struct zspage *zspage); -static void migrate_read_lock(struct zspage *zspage); -static void migrate_read_unlock(struct zspage *zspage); -static void migrate_write_lock(struct zspage *zspage); -static void migrate_write_unlock(struct zspage *zspage); +static void zspage_lock_init(struct zspage *zspage) +{ + atomic_set(&zspage->lock, ZS_PAGE_UNLOCKED); +} + +static void zspage_read_lock(struct zspage *zspage) +{ + atomic_t *lock =3D &zspage->lock; + int old; + + while (1) { + old =3D atomic_read(lock); + if (old =3D=3D ZS_PAGE_WRLOCKED) { + cpu_relax(); + continue; + } + + if (atomic_cmpxchg(lock, old, old + 1) =3D=3D old) + return; + + cpu_relax(); + } +} + +static void zspage_read_unlock(struct zspage *zspage) +{ + atomic_dec(&zspage->lock); +} + +static void zspage_write_lock(struct zspage *zspage) +{ + atomic_t *lock =3D &zspage->lock; + int old; + + while (1) { + old =3D atomic_cmpxchg(lock, ZS_PAGE_UNLOCKED, ZS_PAGE_WRLOCKED); + if (old =3D=3D ZS_PAGE_UNLOCKED) + return; + + cpu_relax(); + } +} + +static void zspage_write_unlock(struct zspage *zspage) +{ + atomic_set(&zspage->lock, ZS_PAGE_UNLOCKED); +} =20 #ifdef CONFIG_COMPACTION static void kick_deferred_free(struct zs_pool *pool); @@ -992,7 +1037,7 @@ static struct zspage *alloc_zspage(struct zs_pool *poo= l, return NULL; =20 zspage->magic =3D ZSPAGE_MAGIC; - migrate_lock_init(zspage); + zspage_lock_init(zspage); =20 for (i =3D 0; i < class->pages_per_zspage; i++) { struct zpdesc *zpdesc; @@ -1217,7 +1262,7 @@ void *zs_map_object(struct zs_pool *pool, unsigned lo= ng handle, * zs_unmap_object API so delegate the locking from class to zspage * which is smaller granularity. */ - migrate_read_lock(zspage); + zspage_read_lock(zspage); read_unlock(&pool->migrate_lock); =20 class =3D zspage_class(pool, zspage); @@ -1277,7 +1322,7 @@ void zs_unmap_object(struct zs_pool *pool, unsigned l= ong handle) } local_unlock(&zs_map_area.lock); =20 - migrate_read_unlock(zspage); + zspage_read_unlock(zspage); } EXPORT_SYMBOL_GPL(zs_unmap_object); =20 @@ -1671,18 +1716,18 @@ static void lock_zspage(struct zspage *zspage) /* * Pages we haven't locked yet can be migrated off the list while we're * trying to lock them, so we need to be careful and only attempt to - * lock each page under migrate_read_lock(). Otherwise, the page we lock + * lock each page under zspage_read_lock(). Otherwise, the page we lock * may no longer belong to the zspage. This means that we may wait for * the wrong page to unlock, so we must take a reference to the page - * prior to waiting for it to unlock outside migrate_read_lock(). + * prior to waiting for it to unlock outside zspage_read_lock(). */ while (1) { - migrate_read_lock(zspage); + zspage_read_lock(zspage); zpdesc =3D get_first_zpdesc(zspage); if (zpdesc_trylock(zpdesc)) break; zpdesc_get(zpdesc); - migrate_read_unlock(zspage); + zspage_read_unlock(zspage); zpdesc_wait_locked(zpdesc); zpdesc_put(zpdesc); } @@ -1693,41 +1738,16 @@ static void lock_zspage(struct zspage *zspage) curr_zpdesc =3D zpdesc; } else { zpdesc_get(zpdesc); - migrate_read_unlock(zspage); + zspage_read_unlock(zspage); zpdesc_wait_locked(zpdesc); zpdesc_put(zpdesc); - migrate_read_lock(zspage); + zspage_read_lock(zspage); } } - migrate_read_unlock(zspage); + zspage_read_unlock(zspage); } #endif /* CONFIG_COMPACTION */ =20 -static void migrate_lock_init(struct zspage *zspage) -{ - rwlock_init(&zspage->lock); -} - -static void migrate_read_lock(struct zspage *zspage) __acquires(&zspage->l= ock) -{ - read_lock(&zspage->lock); -} - -static void migrate_read_unlock(struct zspage *zspage) __releases(&zspage-= >lock) -{ - read_unlock(&zspage->lock); -} - -static void migrate_write_lock(struct zspage *zspage) -{ - write_lock(&zspage->lock); -} - -static void migrate_write_unlock(struct zspage *zspage) -{ - write_unlock(&zspage->lock); -} - #ifdef CONFIG_COMPACTION =20 static const struct movable_operations zsmalloc_mops; @@ -1803,8 +1823,8 @@ static int zs_page_migrate(struct page *newpage, stru= ct page *page, * the class lock protects zpage alloc/free in the zspage. */ spin_lock(&class->lock); - /* the migrate_write_lock protects zpage access via zs_map_object */ - migrate_write_lock(zspage); + /* the zspage_write_lock protects zpage access via zs_map_object */ + zspage_write_lock(zspage); =20 offset =3D get_first_obj_offset(zpdesc); s_addr =3D kmap_local_zpdesc(zpdesc); @@ -1835,7 +1855,7 @@ static int zs_page_migrate(struct page *newpage, stru= ct page *page, */ write_unlock(&pool->migrate_lock); spin_unlock(&class->lock); - migrate_write_unlock(zspage); + zspage_write_unlock(zspage); =20 zpdesc_get(newzpdesc); if (zpdesc_zone(newzpdesc) !=3D zpdesc_zone(zpdesc)) { @@ -1971,9 +1991,9 @@ static unsigned long __zs_compact(struct zs_pool *poo= l, if (!src_zspage) break; =20 - migrate_write_lock(src_zspage); + zspage_write_lock(src_zspage); migrate_zspage(pool, src_zspage, dst_zspage); - migrate_write_unlock(src_zspage); + zspage_write_unlock(src_zspage); =20 fg =3D putback_zspage(class, src_zspage); if (fg =3D=3D ZS_INUSE_RATIO_0) { --=20 2.48.1.262.g85cc9f2d1e-goog From nobody Thu Jan 30 18:52:51 2025 Received: from mail-pl1-f179.google.com (mail-pl1-f179.google.com [209.85.214.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 838FD1537C8 for ; Mon, 27 Jan 2025 08:03:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.179 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737965006; cv=none; b=eOhrP46bzGaXhAMZQLtEAqRiJVxOr+6/zVxf6wcW++vevnsyUS0+/bKIu+SgQ/SbFH/VaId0m78qh2yR3N86CAVPUoXhp2l7B4rludkdujJJt4NFsHyZNkRFiWBePGkZG5DjgibSFKTjb4yg/Cs6DiFxEVy1J4NROjA64f1K0NA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737965006; c=relaxed/simple; bh=TXWJJhcxS2MPvpGIStG8zEpEBFUw2RTEBOjOrPbhrfs=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=GPc6RpPNEOD/rZyAXEN2AeOvGeMrd0cF/OUU/re4ny1D98SG4C3WPNTxDsNTkUPt73YLK7b1gKfIYgAqxTxr2sGVtawvFBNy/04LuZ6wKBAregpkJyvUcADN/Zy2+F3q80XUsq/fO8GEvaRiRs5+4Y8Lfw439n57ckbPQVWf+Pk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=chromium.org; spf=pass smtp.mailfrom=chromium.org; dkim=pass (1024-bit key) header.d=chromium.org header.i=@chromium.org header.b=SgQwiRvU; arc=none smtp.client-ip=209.85.214.179 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=chromium.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=chromium.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=chromium.org header.i=@chromium.org header.b="SgQwiRvU" Received: by mail-pl1-f179.google.com with SMTP id d9443c01a7336-2161eb94cceso48063115ad.2 for ; Mon, 27 Jan 2025 00:03:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1737965004; x=1738569804; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=PXEed/HJJxd4qZRhg1cqdVLafAkTWhNb+3EhlC5igm8=; b=SgQwiRvUi8IKY5TlI/3nkhh5gFMBg5Z3pwFEHaaDvgveRnFioXNYx0spLdPrGYfybu xdbryI7VtndXRHGABQcKE8ESchfmBMyrN55agHHOpNbUY0k2B6RA2l6TMgxwVhFbNt4g Tu7nsoHCV2YF7tSJed+AfOxZpJlfTGJi7javQ= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737965004; x=1738569804; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=PXEed/HJJxd4qZRhg1cqdVLafAkTWhNb+3EhlC5igm8=; b=sqA2IhfcPi9IffKWbDmDAvyJj1e/IvQjMArNrYSUNs0XGEbwagnqh5d8T29RAocGrI hdSifOZiCCFo8Cm11RKPcZg3k4OMdJxvRT3XvZiDobq0sgAytZSFNleAR6fZqLRAQanE FkGM818Z9LagyoGyWKHZgQk+8rjXwAeNMJtBA2yDrLKEDA+mPXj+9eJ9BdelyzHwq8nZ K9x0MZuh6elDwx4sNT2LQsoaS6Ax90zH3oZmAcg+Kw4Bz49+Cb0LeJ3EUapBQY1JoyIM fn/NP/ra4HZgJLde6SIVsMoQDJOqTW2YywbxVrH1uYYPXKzqeJiLXgtaXgaynpuOXA+O 36YQ== X-Forwarded-Encrypted: i=1; AJvYcCV1cyoGraANDQvWt3wXWSgyZiVpGdIknBcbIBxrw/sHzVBpA+usBz2ZcwU7YZHLLH/hw5ntirjlo9XpdKA=@vger.kernel.org X-Gm-Message-State: AOJu0Yw/DjKtpMJ7hirAgLmHyfZCglnkne4J3faRvf5VdLq/niuaX7/4 XaXmsZ2LQh7v4tJfvoPCAExaMJm3/W0y/a69LwITAYUTR7nt/7qQ34vqm4SR4w== X-Gm-Gg: ASbGncvBKqeNZlwKuQVW6kqpB0jR7CVhGDr3KDx0p9TLjHnnWtC3jxDdcSI1OB83WUx /ef97Mq2QKR6VGATzj21wVvEy7Hn9ibzfyXxs+yGm/GpPMVkvjAGxGxQpTcy53nujUtPC9iUP8E CMLzF1dvHXyxpu2mUDi9214Rn11u4ObhYMtC6Gd5gSw3WJYz5gfwsQ3HglBH79FEJyhpTsHbH18 gmJ5HqeIgWb3vQiDUnntxcA5637P0eX4a/0Nrd80+LFX7Xajr6FercRtSxwbrxvEIxGafirNfGR yGq6KKs= X-Google-Smtp-Source: AGHT+IEL57Fr+lPzD2FowIHfSuVgiI3V3YUg02jovjSl+Fh1RpWQYSxO6GJ+HfktJR9cPnagmL7JRg== X-Received: by 2002:a05:6a00:2e15:b0:728:e906:e45a with SMTP id d2e1a72fcca58-72dafbfd7e4mr61554805b3a.24.1737965002160; Mon, 27 Jan 2025 00:03:22 -0800 (PST) Received: from localhost ([2401:fa00:8f:203:566d:6152:c049:8d3a]) by smtp.gmail.com with UTF8SMTPSA id d2e1a72fcca58-72f8a78ee7bsm6686774b3a.170.2025.01.27.00.03.19 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 27 Jan 2025 00:03:21 -0800 (PST) From: Sergey Senozhatsky To: Andrew Morton , Minchan Kim , Johannes Weiner , Yosry Ahmed , Nhat Pham Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Sergey Senozhatsky Subject: [RFC PATCH 3/6] zsmalloc: convert to sleepable pool lock Date: Mon, 27 Jan 2025 16:59:28 +0900 Message-ID: <20250127080254.1302026-4-senozhatsky@chromium.org> X-Mailer: git-send-email 2.48.1.262.g85cc9f2d1e-goog In-Reply-To: <20250127080254.1302026-1-senozhatsky@chromium.org> References: <20250127080254.1302026-1-senozhatsky@chromium.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Switch over from rwlock_t to rwsemaphore, also introduce simple helpers to lock/unlock the pool. This is needed to make zsmalloc preemptible in the future. Signed-off-by: Sergey Senozhatsky --- mm/zsmalloc.c | 58 ++++++++++++++++++++++++++++++++++++--------------- 1 file changed, 41 insertions(+), 17 deletions(-) diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c index 28a75bfbeaa6..751871ec533f 100644 --- a/mm/zsmalloc.c +++ b/mm/zsmalloc.c @@ -204,8 +204,8 @@ struct link_free { }; =20 struct zs_pool { - const char *name; - + /* protect page/zspage migration */ + struct rw_semaphore migrate_lock; struct size_class *size_class[ZS_SIZE_CLASSES]; struct kmem_cache *handle_cachep; struct kmem_cache *zspage_cachep; @@ -216,6 +216,7 @@ struct zs_pool { =20 /* Compact classes */ struct shrinker *shrinker; + atomic_t compaction_in_progress; =20 #ifdef CONFIG_ZSMALLOC_STAT struct dentry *stat_dentry; @@ -223,11 +224,34 @@ struct zs_pool { #ifdef CONFIG_COMPACTION struct work_struct free_work; #endif - /* protect page/zspage migration */ - rwlock_t migrate_lock; - atomic_t compaction_in_progress; + const char *name; }; =20 +static void pool_write_unlock(struct zs_pool *pool) +{ + up_write(&pool->migrate_lock); +} + +static void pool_write_lock(struct zs_pool *pool) +{ + down_write(&pool->migrate_lock); +} + +static void pool_read_unlock(struct zs_pool *pool) +{ + up_read(&pool->migrate_lock); +} + +static void pool_read_lock(struct zs_pool *pool) +{ + down_read(&pool->migrate_lock); +} + +static bool zspool_lock_is_contended(struct zs_pool *pool) +{ + return rwsem_is_contended(&pool->migrate_lock); +} + static inline void zpdesc_set_first(struct zpdesc *zpdesc) { SetPagePrivate(zpdesc_page(zpdesc)); @@ -1251,7 +1275,7 @@ void *zs_map_object(struct zs_pool *pool, unsigned lo= ng handle, BUG_ON(in_interrupt()); =20 /* It guarantees it can get zspage from handle safely */ - read_lock(&pool->migrate_lock); + pool_read_lock(pool); obj =3D handle_to_obj(handle); obj_to_location(obj, &zpdesc, &obj_idx); zspage =3D get_zspage(zpdesc); @@ -1263,7 +1287,7 @@ void *zs_map_object(struct zs_pool *pool, unsigned lo= ng handle, * which is smaller granularity. */ zspage_read_lock(zspage); - read_unlock(&pool->migrate_lock); + pool_read_unlock(pool); =20 class =3D zspage_class(pool, zspage); off =3D offset_in_page(class->size * obj_idx); @@ -1498,13 +1522,13 @@ void zs_free(struct zs_pool *pool, unsigned long ha= ndle) * The pool->migrate_lock protects the race with zpage's migration * so it's safe to get the page from handle. */ - read_lock(&pool->migrate_lock); + pool_read_lock(pool); obj =3D handle_to_obj(handle); obj_to_zpdesc(obj, &f_zpdesc); zspage =3D get_zspage(f_zpdesc); class =3D zspage_class(pool, zspage); spin_lock(&class->lock); - read_unlock(&pool->migrate_lock); + pool_read_unlock(pool); =20 class_stat_sub(class, ZS_OBJS_INUSE, 1); obj_free(class->size, obj); @@ -1816,7 +1840,7 @@ static int zs_page_migrate(struct page *newpage, stru= ct page *page, * The pool migrate_lock protects the race between zpage migration * and zs_free. */ - write_lock(&pool->migrate_lock); + pool_write_lock(pool); class =3D zspage_class(pool, zspage); =20 /* @@ -1853,7 +1877,7 @@ static int zs_page_migrate(struct page *newpage, stru= ct page *page, * Since we complete the data copy and set up new zspage structure, * it's okay to release migration_lock. */ - write_unlock(&pool->migrate_lock); + pool_write_unlock(pool); spin_unlock(&class->lock); zspage_write_unlock(zspage); =20 @@ -1976,7 +2000,7 @@ static unsigned long __zs_compact(struct zs_pool *poo= l, * protect the race between zpage migration and zs_free * as well as zpage allocation/free */ - write_lock(&pool->migrate_lock); + pool_write_lock(pool); spin_lock(&class->lock); while (zs_can_compact(class)) { int fg; @@ -2003,14 +2027,14 @@ static unsigned long __zs_compact(struct zs_pool *p= ool, src_zspage =3D NULL; =20 if (get_fullness_group(class, dst_zspage) =3D=3D ZS_INUSE_RATIO_100 - || rwlock_is_contended(&pool->migrate_lock)) { + || zspool_lock_is_contended(pool)) { putback_zspage(class, dst_zspage); dst_zspage =3D NULL; =20 spin_unlock(&class->lock); - write_unlock(&pool->migrate_lock); + pool_write_unlock(pool); cond_resched(); - write_lock(&pool->migrate_lock); + pool_write_lock(pool); spin_lock(&class->lock); } } @@ -2022,7 +2046,7 @@ static unsigned long __zs_compact(struct zs_pool *poo= l, putback_zspage(class, dst_zspage); =20 spin_unlock(&class->lock); - write_unlock(&pool->migrate_lock); + pool_write_unlock(pool); =20 return pages_freed; } @@ -2159,7 +2183,7 @@ struct zs_pool *zs_create_pool(const char *name) return NULL; =20 init_deferred_free(pool); - rwlock_init(&pool->migrate_lock); + init_rwsem(&pool->migrate_lock); atomic_set(&pool->compaction_in_progress, 0); =20 pool->name =3D kstrdup(name, GFP_KERNEL); --=20 2.48.1.262.g85cc9f2d1e-goog From nobody Thu Jan 30 18:52:51 2025 Received: from mail-pl1-f182.google.com (mail-pl1-f182.google.com [209.85.214.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DC4C31FCCFB for ; Mon, 27 Jan 2025 08:03:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737965009; cv=none; b=neyGGpROW4hiQFPlWOTQjgHOVV1gnzWAalLtVZU+nkklvtj18Aj2hvWs2YUyJbkJHSQyh++jCY6HD6SfxvfSlFdPuJ2qR9t1xKTXteRpgt7kteJ678svtNswkpdOld7MJnhk5cfy2zic5iAh15Ysfa8EGW1iXkODo3clGpd88r0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737965009; c=relaxed/simple; bh=S+oej9yKPiXP1ZIDAZIqZ7EnK0E9uzxVjD1SpZpU2aY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=kRdpobZL6yFGYM5nJrqvdzie2cH5xviv6yJIlx2lVP4DggLLnVFW2R+26VH3SllMEjADkeOk3p6jX48gKffGoqMGq5ieh5U+A8xAfpJd3hUst0VaI8e6+r2GzWbWeSvINGzIyYD56PgrySfv1GGNydRwRlULvB/LVZLE0I6ZgN4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=chromium.org; spf=pass smtp.mailfrom=chromium.org; dkim=pass (1024-bit key) header.d=chromium.org header.i=@chromium.org header.b=U7UPrs8K; arc=none smtp.client-ip=209.85.214.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=chromium.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=chromium.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=chromium.org header.i=@chromium.org header.b="U7UPrs8K" Received: by mail-pl1-f182.google.com with SMTP id d9443c01a7336-2166f1e589cso105743375ad.3 for ; Mon, 27 Jan 2025 00:03:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1737965007; x=1738569807; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=kn1+fRD/5z0gRUNjM1y0pmoc8sr5OaAxZlYjVZdB2hU=; b=U7UPrs8KkFC6wn5lYKtnwd8CJ0oHYCZU7h4Paf2d0qQFCH1l36ORxdjY82wCm3FV8L sZ5RlovGy3UXybkGuegCvju0kD2Zy6Hipq3agkeF5kWeUwPsLSvdo/OSWYTmRg2Z0MdO bPIn/IO7txq1UUF2xhNv5MHZlAwYdkIWG5AyI= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737965007; x=1738569807; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=kn1+fRD/5z0gRUNjM1y0pmoc8sr5OaAxZlYjVZdB2hU=; b=jezuditOtU6LpF4empzvtgabFIfIEMr8Gscd3mtudKgINaE7jZCERNfVvkpiQ7joqZ MLxq6xijUwIugkdbkRXm4D0X2K0YN9jJ9r0LLx/JCVqZPRGvAPM0zDB2kL7T310r/Zxp RiOOuB2JbvM3HtPRUy9T4JnEcfCneIl3x6tmn9sabUtjO4yQlap4Byozz/pfu+k+hg+1 4n9SDFpZnO41yMJN1IeTw8jvlApZz3rMrGd4+Wk3Sg0Jiu9zMyEKPRZbuO/MRkhnyXgK MBpIbppjQ+TzUvzNdawND9YCp0WreSHXW56aFf/B1aKvuk6bGSuOjHrMw16WnOxM7sqJ B3Rw== X-Forwarded-Encrypted: i=1; AJvYcCVTHYrYp2lzN2aN6YcQK/bx9THyraKQGWNC7dKDyDnRZIN060jCW+lR6U4Zp9/OvCp2DwTebM6ERT2u/5o=@vger.kernel.org X-Gm-Message-State: AOJu0YzaQLu2eR6j5EWdqBNwOIG0P7YLwFqRsQ+n0/ELB5H10LvsAnQO c9oGqrl79YRJxWEgL8fUbY1487BKtK+q4kp1pRplZsbgfY1wsNaQNjG32tGCtQ== X-Gm-Gg: ASbGncuGvzVqH6YoZpN3zXQflEpxk5YAyP5j+DkuxlNstjhol2gBcVMVPYcMJAVoqi2 C+ZVEerF38+F+IoaeRlzlkPJgxAyfbqWbEXhemI4l3cuGQ6rrt5tNqON5w7x/1aECP0GoGRnKzc hChdyMUPOrsHa7fAxsQcq/BF20/MV63LMlJQk0z1ZLpRoqEt4iV3iSxjxhSDWucted0LSVM5PIO jXP8sK9gCvPixYx3OzI6abmUPwtiN0y0Gs2SdXmPjg+1GaV502p5RvDnwPegpt/iP0ehqoKyIW0 8Lsaf+E= X-Google-Smtp-Source: AGHT+IE0KtyysZw7cacIrE1leqh4Co/IJ/3WMyHKfL3/R554bfYapuUKAhoc+fS50PnMBkNIdu63xg== X-Received: by 2002:a17:902:ecc6:b0:217:803a:e47a with SMTP id d9443c01a7336-21c3553b21cmr580819105ad.4.1737965007128; Mon, 27 Jan 2025 00:03:27 -0800 (PST) Received: from localhost ([2401:fa00:8f:203:566d:6152:c049:8d3a]) by smtp.gmail.com with UTF8SMTPSA id d9443c01a7336-21da3ea3081sm56832255ad.62.2025.01.27.00.03.24 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 27 Jan 2025 00:03:26 -0800 (PST) From: Sergey Senozhatsky To: Andrew Morton , Minchan Kim , Johannes Weiner , Yosry Ahmed , Nhat Pham Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Sergey Senozhatsky Subject: [RFC PATCH 4/6] zsmalloc: make class lock sleepable Date: Mon, 27 Jan 2025 16:59:29 +0900 Message-ID: <20250127080254.1302026-5-senozhatsky@chromium.org> X-Mailer: git-send-email 2.48.1.262.g85cc9f2d1e-goog In-Reply-To: <20250127080254.1302026-1-senozhatsky@chromium.org> References: <20250127080254.1302026-1-senozhatsky@chromium.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Switch over from spin-lock to mutex, also introduce simple helpers to lock/unlock size class. This is needed to make zsmalloc preemptible in the future. Signed-off-by: Sergey Senozhatsky --- mm/zsmalloc.c | 54 ++++++++++++++++++++++++++++----------------------- 1 file changed, 30 insertions(+), 24 deletions(-) diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c index 751871ec533f..a5c1f9852072 100644 --- a/mm/zsmalloc.c +++ b/mm/zsmalloc.c @@ -168,7 +168,7 @@ static struct dentry *zs_stat_root; static size_t huge_class_size; =20 struct size_class { - spinlock_t lock; + struct mutex lock; struct list_head fullness_list[NR_FULLNESS_GROUPS]; /* * Size of objects stored in this class. Must be multiple @@ -252,6 +252,16 @@ static bool zspool_lock_is_contended(struct zs_pool *p= ool) return rwsem_is_contended(&pool->migrate_lock); } =20 +static void size_class_lock(struct size_class *class) +{ + mutex_lock(&class->lock); +} + +static void size_class_unlock(struct size_class *class) +{ + mutex_unlock(&class->lock); +} + static inline void zpdesc_set_first(struct zpdesc *zpdesc) { SetPagePrivate(zpdesc_page(zpdesc)); @@ -657,8 +667,7 @@ static int zs_stats_size_show(struct seq_file *s, void = *v) if (class->index !=3D i) continue; =20 - spin_lock(&class->lock); - + size_class_lock(class); seq_printf(s, " %5u %5u ", i, class->size); for (fg =3D ZS_INUSE_RATIO_10; fg < NR_FULLNESS_GROUPS; fg++) { inuse_totals[fg] +=3D class_stat_read(class, fg); @@ -668,7 +677,7 @@ static int zs_stats_size_show(struct seq_file *s, void = *v) obj_allocated =3D class_stat_read(class, ZS_OBJS_ALLOCATED); obj_used =3D class_stat_read(class, ZS_OBJS_INUSE); freeable =3D zs_can_compact(class); - spin_unlock(&class->lock); + size_class_unlock(class); =20 objs_per_zspage =3D class->objs_per_zspage; pages_used =3D obj_allocated / objs_per_zspage * @@ -926,8 +935,6 @@ static void __free_zspage(struct zs_pool *pool, struct = size_class *class, { struct zpdesc *zpdesc, *next; =20 - assert_spin_locked(&class->lock); - VM_BUG_ON(get_zspage_inuse(zspage)); VM_BUG_ON(zspage->fullness !=3D ZS_INUSE_RATIO_0); =20 @@ -1443,7 +1450,7 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t = size, gfp_t gfp) class =3D pool->size_class[get_size_class_index(size)]; =20 /* class->lock effectively protects the zpage migration */ - spin_lock(&class->lock); + size_class_lock(class); zspage =3D find_get_zspage(class); if (likely(zspage)) { obj_malloc(pool, zspage, handle); @@ -1453,8 +1460,7 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t = size, gfp_t gfp) =20 goto out; } - - spin_unlock(&class->lock); + size_class_unlock(class); =20 zspage =3D alloc_zspage(pool, class, gfp); if (!zspage) { @@ -1462,7 +1468,7 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t = size, gfp_t gfp) return (unsigned long)ERR_PTR(-ENOMEM); } =20 - spin_lock(&class->lock); + size_class_lock(class); obj_malloc(pool, zspage, handle); newfg =3D get_fullness_group(class, zspage); insert_zspage(class, zspage, newfg); @@ -1473,7 +1479,7 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t = size, gfp_t gfp) /* We completely set up zspage so mark them as movable */ SetZsPageMovable(pool, zspage); out: - spin_unlock(&class->lock); + size_class_unlock(class); =20 return handle; } @@ -1527,7 +1533,7 @@ void zs_free(struct zs_pool *pool, unsigned long hand= le) obj_to_zpdesc(obj, &f_zpdesc); zspage =3D get_zspage(f_zpdesc); class =3D zspage_class(pool, zspage); - spin_lock(&class->lock); + size_class_lock(class); pool_read_unlock(pool); =20 class_stat_sub(class, ZS_OBJS_INUSE, 1); @@ -1537,7 +1543,7 @@ void zs_free(struct zs_pool *pool, unsigned long hand= le) if (fullness =3D=3D ZS_INUSE_RATIO_0) free_zspage(pool, class, zspage); =20 - spin_unlock(&class->lock); + size_class_unlock(class); cache_free_handle(pool, handle); } EXPORT_SYMBOL_GPL(zs_free); @@ -1846,7 +1852,7 @@ static int zs_page_migrate(struct page *newpage, stru= ct page *page, /* * the class lock protects zpage alloc/free in the zspage. */ - spin_lock(&class->lock); + size_class_lock(class); /* the zspage_write_lock protects zpage access via zs_map_object */ zspage_write_lock(zspage); =20 @@ -1878,7 +1884,7 @@ static int zs_page_migrate(struct page *newpage, stru= ct page *page, * it's okay to release migration_lock. */ pool_write_unlock(pool); - spin_unlock(&class->lock); + size_class_unlock(class); zspage_write_unlock(zspage); =20 zpdesc_get(newzpdesc); @@ -1922,10 +1928,10 @@ static void async_free_zspage(struct work_struct *w= ork) if (class->index !=3D i) continue; =20 - spin_lock(&class->lock); + size_class_lock(class); list_splice_init(&class->fullness_list[ZS_INUSE_RATIO_0], &free_pages); - spin_unlock(&class->lock); + size_class_unlock(class); } =20 list_for_each_entry_safe(zspage, tmp, &free_pages, list) { @@ -1933,10 +1939,10 @@ static void async_free_zspage(struct work_struct *w= ork) lock_zspage(zspage); =20 class =3D zspage_class(pool, zspage); - spin_lock(&class->lock); + size_class_lock(class); class_stat_sub(class, ZS_INUSE_RATIO_0, 1); __free_zspage(pool, class, zspage); - spin_unlock(&class->lock); + size_class_unlock(class); } }; =20 @@ -2001,7 +2007,7 @@ static unsigned long __zs_compact(struct zs_pool *poo= l, * as well as zpage allocation/free */ pool_write_lock(pool); - spin_lock(&class->lock); + size_class_lock(class); while (zs_can_compact(class)) { int fg; =20 @@ -2031,11 +2037,11 @@ static unsigned long __zs_compact(struct zs_pool *p= ool, putback_zspage(class, dst_zspage); dst_zspage =3D NULL; =20 - spin_unlock(&class->lock); + size_class_unlock(class); pool_write_unlock(pool); cond_resched(); pool_write_lock(pool); - spin_lock(&class->lock); + size_class_lock(class); } } =20 @@ -2045,7 +2051,7 @@ static unsigned long __zs_compact(struct zs_pool *poo= l, if (dst_zspage) putback_zspage(class, dst_zspage); =20 - spin_unlock(&class->lock); + size_class_unlock(class); pool_write_unlock(pool); =20 return pages_freed; @@ -2255,7 +2261,7 @@ struct zs_pool *zs_create_pool(const char *name) class->index =3D i; class->pages_per_zspage =3D pages_per_zspage; class->objs_per_zspage =3D objs_per_zspage; - spin_lock_init(&class->lock); + mutex_init(&class->lock); pool->size_class[i] =3D class; =20 fullness =3D ZS_INUSE_RATIO_0; --=20 2.48.1.262.g85cc9f2d1e-goog From nobody Thu Jan 30 18:52:51 2025 Received: from mail-pl1-f179.google.com (mail-pl1-f179.google.com [209.85.214.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AB05F16F0CF for ; Mon, 27 Jan 2025 08:03:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.179 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737965015; cv=none; b=rvGmX9ZIwZU/7DXDTDGNmLcBJEeYfaPLMDuMfo+St2gUAYAbAhzbZ45SAoDGNTIylKlprnAiG4ntTB0KhfAtzyeK/6zBE+deDU8EaaFvkvabSYGrj+T60n0m4PEcP8d/OzRj3CIVHy+Y2oJUAIBMrUaCodV3+ZsjiGMbfE7OUiw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737965015; c=relaxed/simple; bh=O2ZhDJ+hwIsgPn9zHz7znAiP52e4YpvGu7Dn5GgjGms=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=oeDKd06OqisGUnD3nZL3ezU6V/ej+4I6PdrONWUr7VWrjFQfsf25jiw/3SZRp26JhjuqXdOM4J5Ip5zOVRMQIlzk5KEPSoxeGrIBjBQhxR9AFWGqQ7lPeRj2Hg4ZnKleFPC3VpfN53871zVcnx9Og5B1GcemOClYs274WvdWGak= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=chromium.org; spf=pass smtp.mailfrom=chromium.org; dkim=pass (1024-bit key) header.d=chromium.org header.i=@chromium.org header.b=iIP3Zo3L; arc=none smtp.client-ip=209.85.214.179 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=chromium.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=chromium.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=chromium.org header.i=@chromium.org header.b="iIP3Zo3L" Received: by mail-pl1-f179.google.com with SMTP id d9443c01a7336-2162c0f6a39so91693265ad.0 for ; Mon, 27 Jan 2025 00:03:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1737965013; x=1738569813; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=fOh8kdIggNW/MgiQjrbJnv2UXDF8Y7xtTwQgOkh+AAY=; b=iIP3Zo3LgdKJkVYpQnXP4VxaaJy9S2eB3jFZk2GlH1m9WmuqLkL3gkWkRXqHo51hB8 HRmjZFxu9ZSVTsxwTPBt+/T6iuFn7GOBjSJkD7Rn7NMpIOw3FYz5hDGVe22Gwt7eaBme TorZ3BgEmffTbu+Q6xq9sxHUEGtDm/apn2FEc= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737965013; x=1738569813; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=fOh8kdIggNW/MgiQjrbJnv2UXDF8Y7xtTwQgOkh+AAY=; b=K7C0puOLPdqRQdeK8H54LA77DtIg/3SVpq9a65Z04b4jGme7eznepyuqwHuigjeR9w hgRGNnGkMcEj3E7UdsU7zjCXarhSIHgUiNdfGLoQuqmPfQPEUamLEZn1TTRIW1DKiQvO IWNd8s1tco3bWp+PuNtUE5jDxcDNtKIj2WWDezQqfZolRgSagiIp29VkIt/0o4erBjDy VFJ+qdqu9JZHpxWp2W6PHM6292hd2WGlGJMrS2q75ujZWeV8qQzj6uD8tEKtF3MmYEoj 28LP93HUmnrJBEtpvZ+xKZPJ3hVC0qVyQQ3STG5dWNDFT0Ko+889y2V2tBwdhYluZkTh sLew== X-Forwarded-Encrypted: i=1; AJvYcCUKhC+J196EnrIhNJgZREDge6suDaEY++Z+b8H2xSgFFOnds/bS1OPc/Wtttc2PD5SlMhIgLgaXB0Lw1HA=@vger.kernel.org X-Gm-Message-State: AOJu0Yxvv9bLUTSjpa/KKvZFUv3SEzogdXnDvxdXsiFhVHHzMdfI2/B4 9/NXh2RJZ/vfrRxmJZHMshFHijS/n+sUiwrAg+Cy1sMI0CWBfsd6NzecsSEU1A== X-Gm-Gg: ASbGncsgkEWSvQE/ercQT1k+h83FzVSltuvABXFggqLW5NoXnUxBKXPGHtC+lrOkfn/ oYpoOoRtZQJhbZlQemoy5kmtkvmQuFJfAPxh5KQf0NewepNR8uMBZIJRPj1Sa5VVElVvhQfEEMb FiwmaagDJKGUVmjz4QCYnwG7pqOcL5la4fvlGbG4w/Q3kulikx0yhXHwNCujRjMZYW9qLAd2mA1 L3jFr7jDIQFzY9/mS25VsB5X9yxoF3GkoToFkwTRnRGZzkGpUXzfrNSAVUWMcbpZaaxOO1rWlzk ndsuZNk= X-Google-Smtp-Source: AGHT+IHCMZnq78KzilDYy3SoIuDu6zyyr9bdzoDlo65Vrz4W4RnlHVGP0xG9ma7YfvS5hlyVHvHmww== X-Received: by 2002:a05:6a00:ac4:b0:725:d9b6:3952 with SMTP id d2e1a72fcca58-72f7d1d9573mr28441245b3a.3.1737965012759; Mon, 27 Jan 2025 00:03:32 -0800 (PST) Received: from localhost ([2401:fa00:8f:203:566d:6152:c049:8d3a]) by smtp.gmail.com with UTF8SMTPSA id d2e1a72fcca58-72f8a78dfa2sm6426964b3a.157.2025.01.27.00.03.30 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 27 Jan 2025 00:03:32 -0800 (PST) From: Sergey Senozhatsky To: Andrew Morton , Minchan Kim , Johannes Weiner , Yosry Ahmed , Nhat Pham Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Sergey Senozhatsky Subject: [RFC PATCH 5/6] zsmalloc: introduce handle mapping API Date: Mon, 27 Jan 2025 16:59:30 +0900 Message-ID: <20250127080254.1302026-6-senozhatsky@chromium.org> X-Mailer: git-send-email 2.48.1.262.g85cc9f2d1e-goog In-Reply-To: <20250127080254.1302026-1-senozhatsky@chromium.org> References: <20250127080254.1302026-1-senozhatsky@chromium.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Introduce new API to map/unmap zsmalloc handle/object. The key difference is that this API does not impose atomicity restrictions on its users, unlike zs_map_object() which returns with page-faults and preemption disabled - handle mapping API does not need a per-CPU vm-area because the users are required to provide an aux buffer for objects that span several physical pages. Keep zs_map_object/zs_unmap_object for the time being, as there are still users of it, but eventually old API will be removed. Signed-off-by: Sergey Senozhatsky --- include/linux/zsmalloc.h | 29 ++++++++ mm/zsmalloc.c | 148 ++++++++++++++++++++++++++++----------- 2 files changed, 138 insertions(+), 39 deletions(-) diff --git a/include/linux/zsmalloc.h b/include/linux/zsmalloc.h index a48cd0ffe57d..72d84537dd38 100644 --- a/include/linux/zsmalloc.h +++ b/include/linux/zsmalloc.h @@ -58,4 +58,33 @@ unsigned long zs_compact(struct zs_pool *pool); unsigned int zs_lookup_class_index(struct zs_pool *pool, unsigned int size= ); =20 void zs_pool_stats(struct zs_pool *pool, struct zs_pool_stats *stats); + +struct zs_handle_mapping { + unsigned long handle; + /* Points to start of the object data either within local_copy or + * within local_mapping. This is what callers should use to access + * or modify handle data. + */ + void *handle_mem; + + enum zs_mapmode mode; + union { + /* + * Handle object data copied, because it spans across several + * (non-contiguous) physical pages. This pointer should be + * set by the zs_map_handle() caller beforehand and should + * never be accessed directly. + */ + void *local_copy; + /* + * Handle object mapped directly. Should never be used + * directly. + */ + void *local_mapping; + }; +}; + +int zs_map_handle(struct zs_pool *pool, struct zs_handle_mapping *map); +void zs_unmap_handle(struct zs_pool *pool, struct zs_handle_mapping *map); + #endif diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c index a5c1f9852072..281bba4a3277 100644 --- a/mm/zsmalloc.c +++ b/mm/zsmalloc.c @@ -1132,18 +1132,14 @@ static inline void __zs_cpu_down(struct mapping_are= a *area) area->vm_buf =3D NULL; } =20 -static void *__zs_map_object(struct mapping_area *area, - struct zpdesc *zpdescs[2], int off, int size) +static void zs_obj_copyin(void *buf, struct zpdesc *zpdesc, int off, int s= ize) { + struct zpdesc *zpdescs[2]; size_t sizes[2]; - char *buf =3D area->vm_buf; - - /* disable page faults to match kmap_local_page() return conditions */ - pagefault_disable(); =20 - /* no read fastpath */ - if (area->vm_mm =3D=3D ZS_MM_WO) - goto out; + zpdescs[0] =3D zpdesc; + zpdescs[1] =3D get_next_zpdesc(zpdesc); + BUG_ON(!zpdescs[1]); =20 sizes[0] =3D PAGE_SIZE - off; sizes[1] =3D size - sizes[0]; @@ -1151,21 +1147,17 @@ static void *__zs_map_object(struct mapping_area *a= rea, /* copy object to per-cpu buffer */ memcpy_from_page(buf, zpdesc_page(zpdescs[0]), off, sizes[0]); memcpy_from_page(buf + sizes[0], zpdesc_page(zpdescs[1]), 0, sizes[1]); -out: - return area->vm_buf; } =20 -static void __zs_unmap_object(struct mapping_area *area, - struct zpdesc *zpdescs[2], int off, int size) +static void zs_obj_copyout(void *buf, struct zpdesc *zpdesc, int off, int = size) { + struct zpdesc *zpdescs[2]; size_t sizes[2]; - char *buf; =20 - /* no write fastpath */ - if (area->vm_mm =3D=3D ZS_MM_RO) - goto out; + zpdescs[0] =3D zpdesc; + zpdescs[1] =3D get_next_zpdesc(zpdesc); + BUG_ON(!zpdescs[1]); =20 - buf =3D area->vm_buf; buf =3D buf + ZS_HANDLE_SIZE; size -=3D ZS_HANDLE_SIZE; off +=3D ZS_HANDLE_SIZE; @@ -1176,10 +1168,6 @@ static void __zs_unmap_object(struct mapping_area *a= rea, /* copy per-cpu buffer to object */ memcpy_to_page(zpdesc_page(zpdescs[0]), off, buf, sizes[0]); memcpy_to_page(zpdesc_page(zpdescs[1]), 0, buf + sizes[0], sizes[1]); - -out: - /* enable page faults to match kunmap_local() return conditions */ - pagefault_enable(); } =20 static int zs_cpu_prepare(unsigned int cpu) @@ -1260,6 +1248,8 @@ EXPORT_SYMBOL_GPL(zs_get_total_pages); * against nested mappings. * * This function returns with preemption and page faults disabled. + * + * NOTE: this function is deprecated and will be removed. */ void *zs_map_object(struct zs_pool *pool, unsigned long handle, enum zs_mapmode mm) @@ -1268,10 +1258,8 @@ void *zs_map_object(struct zs_pool *pool, unsigned l= ong handle, struct zpdesc *zpdesc; unsigned long obj, off; unsigned int obj_idx; - struct size_class *class; struct mapping_area *area; - struct zpdesc *zpdescs[2]; void *ret; =20 /* @@ -1309,12 +1297,14 @@ void *zs_map_object(struct zs_pool *pool, unsigned = long handle, goto out; } =20 - /* this object spans two pages */ - zpdescs[0] =3D zpdesc; - zpdescs[1] =3D get_next_zpdesc(zpdesc); - BUG_ON(!zpdescs[1]); + ret =3D area->vm_buf; + /* disable page faults to match kmap_local_page() return conditions */ + pagefault_disable(); + if (mm !=3D ZS_MM_WO) { + /* this object spans two pages */ + zs_obj_copyin(area->vm_buf, zpdesc, off, class->size); + } =20 - ret =3D __zs_map_object(area, zpdescs, off, class->size); out: if (likely(!ZsHugePage(zspage))) ret +=3D ZS_HANDLE_SIZE; @@ -1323,13 +1313,13 @@ void *zs_map_object(struct zs_pool *pool, unsigned = long handle, } EXPORT_SYMBOL_GPL(zs_map_object); =20 +/* NOTE: this function is deprecated and will be removed. */ void zs_unmap_object(struct zs_pool *pool, unsigned long handle) { struct zspage *zspage; struct zpdesc *zpdesc; unsigned long obj, off; unsigned int obj_idx; - struct size_class *class; struct mapping_area *area; =20 @@ -1340,23 +1330,103 @@ void zs_unmap_object(struct zs_pool *pool, unsigne= d long handle) off =3D offset_in_page(class->size * obj_idx); =20 area =3D this_cpu_ptr(&zs_map_area); - if (off + class->size <=3D PAGE_SIZE) + if (off + class->size <=3D PAGE_SIZE) { kunmap_local(area->vm_addr); - else { - struct zpdesc *zpdescs[2]; + goto out; + } =20 - zpdescs[0] =3D zpdesc; - zpdescs[1] =3D get_next_zpdesc(zpdesc); - BUG_ON(!zpdescs[1]); + if (area->vm_mm !=3D ZS_MM_RO) + zs_obj_copyout(area->vm_buf, zpdesc, off, class->size); + /* enable page faults to match kunmap_local() return conditions */ + pagefault_enable(); =20 - __zs_unmap_object(area, zpdescs, off, class->size); - } +out: local_unlock(&zs_map_area.lock); - zspage_read_unlock(zspage); } EXPORT_SYMBOL_GPL(zs_unmap_object); =20 +void zs_unmap_handle(struct zs_pool *pool, struct zs_handle_mapping *map) +{ + struct zspage *zspage; + struct zpdesc *zpdesc; + unsigned long obj, off; + unsigned int obj_idx; + struct size_class *class; + + obj =3D handle_to_obj(map->handle); + obj_to_location(obj, &zpdesc, &obj_idx); + zspage =3D get_zspage(zpdesc); + class =3D zspage_class(pool, zspage); + off =3D offset_in_page(class->size * obj_idx); + + if (off + class->size <=3D PAGE_SIZE) { + kunmap_local(map->local_mapping); + goto out; + } + + if (map->mode !=3D ZS_MM_RO) + zs_obj_copyout(map->local_copy, zpdesc, off, class->size); + +out: + zspage_read_unlock(zspage); +} +EXPORT_SYMBOL_GPL(zs_unmap_handle); + +int zs_map_handle(struct zs_pool *pool, struct zs_handle_mapping *map) +{ + struct zspage *zspage; + struct zpdesc *zpdesc; + unsigned long obj, off; + unsigned int obj_idx; + struct size_class *class; + + WARN_ON(in_interrupt()); + + /* It guarantees it can get zspage from handle safely */ + pool_read_lock(pool); + obj =3D handle_to_obj(map->handle); + obj_to_location(obj, &zpdesc, &obj_idx); + zspage =3D get_zspage(zpdesc); + + /* + * migration cannot move any zpages in this zspage. Here, class->lock + * is too heavy since callers would take some time until they calls + * zs_unmap_object API so delegate the locking from class to zspage + * which is smaller granularity. + */ + zspage_read_lock(zspage); + pool_read_unlock(pool); + + class =3D zspage_class(pool, zspage); + off =3D offset_in_page(class->size * obj_idx); + + if (off + class->size <=3D PAGE_SIZE) { + /* this object is contained entirely within a page */ + map->local_mapping =3D kmap_local_zpdesc(zpdesc); + map->handle_mem =3D map->local_mapping + off; + goto out; + } + + if (WARN_ON_ONCE(!map->local_copy)) { + zspage_read_unlock(zspage); + return -EINVAL; + } + + map->handle_mem =3D map->local_copy; + if (map->mode !=3D ZS_MM_WO) { + /* this object spans two pages */ + zs_obj_copyin(map->local_copy, zpdesc, off, class->size); + } + +out: + if (likely(!ZsHugePage(zspage))) + map->handle_mem +=3D ZS_HANDLE_SIZE; + + return 0; +} +EXPORT_SYMBOL_GPL(zs_map_handle); + /** * zs_huge_class_size() - Returns the size (in bytes) of the first huge * zsmalloc &size_class. --=20 2.48.1.262.g85cc9f2d1e-goog From nobody Thu Jan 30 18:52:51 2025 Received: from mail-pl1-f182.google.com (mail-pl1-f182.google.com [209.85.214.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DE2FE1FCD0F for ; Mon, 27 Jan 2025 08:03:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737965020; cv=none; b=iihfzcBqKFUOGPm1Qhh/DB0Rwamr5Ui4qaFdDvFUnlR3v33CRgAVdQGfZac1fT74QWxXoOD/mnmoY/n8x4PTTlxjJ0ehwomTJinTtzAvPYqSqnsTiAKJYXMzUAn9qLz5yjYfqV+V1i4s1d1saadsYReUopbGngBKF+AzxFeGG8E= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737965020; c=relaxed/simple; bh=J1Dr2ObHXa85bIpSWFK2jLpTfm0PI7vlevCUfEJwayA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=BxetSeOqexXoHoc/wsq9E0gCw754Rzga1oGHmoy5As8Vygr8JnNzsQSC3yeiks1N0A3BHmv2RifKaNwZwesDI3lyk/Ojp4lX23smZt+QUy5ZT4o4EmHAhiffel+63xzm1SMVndXOfFiLlron26D6+oJvGisG7BuOqEdYVmf6rCo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=chromium.org; spf=pass smtp.mailfrom=chromium.org; dkim=pass (1024-bit key) header.d=chromium.org header.i=@chromium.org header.b=XuHARh6C; arc=none smtp.client-ip=209.85.214.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=chromium.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=chromium.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=chromium.org header.i=@chromium.org header.b="XuHARh6C" Received: by mail-pl1-f182.google.com with SMTP id d9443c01a7336-2164b662090so77923505ad.1 for ; Mon, 27 Jan 2025 00:03:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1737965018; x=1738569818; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=651sDMCEJHysCKLNgE9iOdg6zJ4Jwg3Nj3A7ATDjnVc=; b=XuHARh6C66rnxRgIUYlJhYNsFcAcQtsOGakeoz1MX+I2ArM2LvWQLRKNw474b29TDp j0eALp3yncrEwYR1IoI10PdgrPT1ARzTUCL7x1tNhXVB37ubgqVfpHnNTX7hAPNOh10q OUgBoZgadvBX4q0lLr9k1PEBHg9sQws62N67w= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737965018; x=1738569818; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=651sDMCEJHysCKLNgE9iOdg6zJ4Jwg3Nj3A7ATDjnVc=; b=txSjGWf+0U7bDkUH06Z53ybWvJqZpitUfzEPEn4UdIT5mWVAg8NEjDuuApS+iBv0+k JyVJFBJKnrgRlZaPrvBrfKlgAoqlM+ZDEl0P9rCM3d/gte0KidCWe1WsNT7JIfOumYls roiDOawX4URzt7712R46pr6TEmA4nNKcy1vMJ78fDQsD+KYCP+ePAcICfvO7uGSgW8uo jCNHHrKgcXq3nZnILEiYWbd6WfLaTgoyFENNonsLl9mLfH9CpS12lAi2i51xxfH9TpOZ B61CkT8zKLDCjaZR7LEthRNb33rA9Pyc34sdiWRwwYaKbi0NM36fD+9TKg+qQENBUDrO CeWQ== X-Forwarded-Encrypted: i=1; AJvYcCXEyNT6KtlYkcv/j0CVzWv3nmt2LpSjgQJVgolbWFIG65X39BOuA+FpzU8+vsrBLQubXKiLevVHQftTqlo=@vger.kernel.org X-Gm-Message-State: AOJu0YyigjF+JA/NEBoeiVYKHP8Xp0sQyozp6LEqjMhtG+KBuGsdwauq 4nY6ghyhyiqVuhPdcBYZ4w6uW6zPn7e7Sgnkosfw0M7i8l7GVlyNuXO0N8xSvQ== X-Gm-Gg: ASbGnctAcZXjF+4MeitX+TXDRNZT67ReoXOx8EtDujwo7+Wi73OyjGYWL0SZ1lbeKzz g8dlvdtBKTlymKW/7kO6C7LzBEJObDV9xTaYTSxN4k5OIQ+US+boxLHy1dgqD1YnNAWKRXUsSCC 5jhQS+O3ZNHkKMaPOZg2qSm31Q7wG4Pf9VdLNkmLbD7JAmWZeuBTc3tzDucg9T6sDa8srqg6g/S VacqbZBlcBCIkwZRxTwQJkHmzmmTHrfy4GB0Sb+gPB4GoRABlynXXcK3RMFVYhrEi5Zpsn4Unto H+9kCcA= X-Google-Smtp-Source: AGHT+IF4OL0p2X71WwYse+YFBRvC81hL1zRDbOr33Q/MhbIwiQRVy63K3CZ31U6CYc/ujJCK075RbA== X-Received: by 2002:a17:902:db0a:b0:216:7761:cc49 with SMTP id d9443c01a7336-21c355f70c7mr542469735ad.47.1737965018362; Mon, 27 Jan 2025 00:03:38 -0800 (PST) Received: from localhost ([2401:fa00:8f:203:566d:6152:c049:8d3a]) by smtp.gmail.com with UTF8SMTPSA id 41be03b00d2f7-ac496bbce61sm5749314a12.68.2025.01.27.00.03.36 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 27 Jan 2025 00:03:38 -0800 (PST) From: Sergey Senozhatsky To: Andrew Morton , Minchan Kim , Johannes Weiner , Yosry Ahmed , Nhat Pham Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Sergey Senozhatsky Subject: [RFC PATCH 6/6] zram: switch over to zshandle mapping API Date: Mon, 27 Jan 2025 16:59:31 +0900 Message-ID: <20250127080254.1302026-7-senozhatsky@chromium.org> X-Mailer: git-send-email 2.48.1.262.g85cc9f2d1e-goog In-Reply-To: <20250127080254.1302026-1-senozhatsky@chromium.org> References: <20250127080254.1302026-1-senozhatsky@chromium.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Use new zsmalloc handle mapping API so now zram read() becomes preemptible. Signed-off-by: Sergey Senozhatsky --- drivers/block/zram/zcomp.c | 4 +- drivers/block/zram/zcomp.h | 2 + drivers/block/zram/zram_drv.c | 103 ++++++++++++++++++---------------- 3 files changed, 61 insertions(+), 48 deletions(-) diff --git a/drivers/block/zram/zcomp.c b/drivers/block/zram/zcomp.c index efd5919808d9..9b373ab1ee0b 100644 --- a/drivers/block/zram/zcomp.c +++ b/drivers/block/zram/zcomp.c @@ -45,6 +45,7 @@ static const struct zcomp_ops *backends[] =3D { static void zcomp_strm_free(struct zcomp *comp, struct zcomp_strm *strm) { comp->ops->destroy_ctx(&strm->ctx); + vfree(strm->handle_mem_copy); vfree(strm->buffer); kfree(strm); } @@ -66,12 +67,13 @@ static struct zcomp_strm *zcomp_strm_alloc(struct zcomp= *comp) return NULL; } =20 + strm->handle_mem_copy =3D vzalloc(PAGE_SIZE); /* * allocate 2 pages. 1 for compressed data, plus 1 extra in case if * compressed data is larger than the original one. */ strm->buffer =3D vzalloc(2 * PAGE_SIZE); - if (!strm->buffer) { + if (!strm->buffer || !strm->handle_mem_copy) { zcomp_strm_free(comp, strm); return NULL; } diff --git a/drivers/block/zram/zcomp.h b/drivers/block/zram/zcomp.h index 62330829db3f..f003f09820a5 100644 --- a/drivers/block/zram/zcomp.h +++ b/drivers/block/zram/zcomp.h @@ -34,6 +34,8 @@ struct zcomp_strm { struct list_head entry; /* compression buffer */ void *buffer; + /* handle object memory copy */ + void *handle_mem_copy; struct zcomp_ctx ctx; }; =20 diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c index 9c72beb86ab0..120055b11520 100644 --- a/drivers/block/zram/zram_drv.c +++ b/drivers/block/zram/zram_drv.c @@ -1558,37 +1558,43 @@ static int read_same_filled_page(struct zram *zram,= struct page *page, static int read_incompressible_page(struct zram *zram, struct page *page, u32 index) { - unsigned long handle; - void *src, *dst; + struct zs_handle_mapping hm; + void *dst; =20 - handle =3D zram_get_handle(zram, index); - src =3D zs_map_object(zram->mem_pool, handle, ZS_MM_RO); + hm.handle =3D zram_get_handle(zram, index); + hm.mode =3D ZS_MM_RO; + + zs_map_handle(zram->mem_pool, &hm); dst =3D kmap_local_page(page); - copy_page(dst, src); + copy_page(dst, hm.handle_mem); kunmap_local(dst); - zs_unmap_object(zram->mem_pool, handle); + zs_unmap_handle(zram->mem_pool, &hm); =20 return 0; } =20 static int read_compressed_page(struct zram *zram, struct page *page, u32 = index) { + struct zs_handle_mapping hm; struct zcomp_strm *zstrm; - unsigned long handle; unsigned int size; - void *src, *dst; + void *dst; int ret, prio; =20 - handle =3D zram_get_handle(zram, index); size =3D zram_get_obj_size(zram, index); prio =3D zram_get_priority(zram, index); =20 zstrm =3D zcomp_stream_get(zram->comps[prio]); - src =3D zs_map_object(zram->mem_pool, handle, ZS_MM_RO); + hm.handle =3D zram_get_handle(zram, index); + hm.mode =3D ZS_MM_RO; + hm.local_copy =3D zstrm->handle_mem_copy; + + zs_map_handle(zram->mem_pool, &hm); dst =3D kmap_local_page(page); - ret =3D zcomp_decompress(zram->comps[prio], zstrm, src, size, dst); + ret =3D zcomp_decompress(zram->comps[prio], zstrm, + hm.handle_mem, size, dst); kunmap_local(dst); - zs_unmap_object(zram->mem_pool, handle); + zs_unmap_handle(zram->mem_pool, &hm); zcomp_stream_put(zram->comps[prio], zstrm); =20 return ret; @@ -1683,33 +1689,34 @@ static int write_same_filled_page(struct zram *zram= , unsigned long fill, static int write_incompressible_page(struct zram *zram, struct page *page, u32 index) { - unsigned long handle; - void *src, *dst; + struct zs_handle_mapping hm; + void *src; =20 /* * This function is called from preemptible context so we don't need * to do optimistic and fallback to pessimistic handle allocation, * like we do for compressible pages. */ - handle =3D zs_malloc(zram->mem_pool, PAGE_SIZE, - GFP_NOIO | __GFP_HIGHMEM | __GFP_MOVABLE); - if (IS_ERR_VALUE(handle)) - return PTR_ERR((void *)handle); + hm.handle =3D zs_malloc(zram->mem_pool, PAGE_SIZE, + GFP_NOIO | __GFP_HIGHMEM | __GFP_MOVABLE); + if (IS_ERR_VALUE(hm.handle)) + return PTR_ERR((void *)hm.handle); =20 if (!zram_can_store_page(zram)) { - zs_free(zram->mem_pool, handle); + zs_free(zram->mem_pool, hm.handle); return -ENOMEM; } =20 - dst =3D zs_map_object(zram->mem_pool, handle, ZS_MM_WO); + hm.mode =3D ZS_MM_WO; + zs_map_handle(zram->mem_pool, &hm); src =3D kmap_local_page(page); - memcpy(dst, src, PAGE_SIZE); + memcpy(hm.handle_mem, src, PAGE_SIZE); kunmap_local(src); - zs_unmap_object(zram->mem_pool, handle); + zs_unmap_handle(zram->mem_pool, &hm); =20 zram_slot_write_lock(zram, index); zram_set_flag(zram, index, ZRAM_HUGE); - zram_set_handle(zram, index, handle); + zram_set_handle(zram, index, hm.handle); zram_set_obj_size(zram, index, PAGE_SIZE); zram_slot_write_unlock(zram, index); =20 @@ -1724,9 +1731,9 @@ static int write_incompressible_page(struct zram *zra= m, struct page *page, static int zram_write_page(struct zram *zram, struct page *page, u32 index) { int ret =3D 0; - unsigned long handle; + struct zs_handle_mapping hm; unsigned int comp_len; - void *dst, *mem; + void *mem; struct zcomp_strm *zstrm; unsigned long element; bool same_filled; @@ -1758,25 +1765,26 @@ static int zram_write_page(struct zram *zram, struc= t page *page, u32 index) return write_incompressible_page(zram, page, index); } =20 - handle =3D zs_malloc(zram->mem_pool, comp_len, - GFP_NOIO | __GFP_HIGHMEM | __GFP_MOVABLE); - if (IS_ERR_VALUE(handle)) - return PTR_ERR((void *)handle); + hm.handle =3D zs_malloc(zram->mem_pool, comp_len, + GFP_NOIO | __GFP_HIGHMEM | __GFP_MOVABLE); + if (IS_ERR_VALUE(hm.handle)) + return PTR_ERR((void *)hm.handle); =20 if (!zram_can_store_page(zram)) { zcomp_stream_put(zram->comps[ZRAM_PRIMARY_COMP], zstrm); - zs_free(zram->mem_pool, handle); + zs_free(zram->mem_pool, hm.handle); return -ENOMEM; } =20 - dst =3D zs_map_object(zram->mem_pool, handle, ZS_MM_WO); - - memcpy(dst, zstrm->buffer, comp_len); + hm.mode =3D ZS_MM_WO; + hm.local_copy =3D zstrm->handle_mem_copy; + zs_map_handle(zram->mem_pool, &hm); + memcpy(hm.handle_mem, zstrm->buffer, comp_len); + zs_unmap_handle(zram->mem_pool, &hm); zcomp_stream_put(zram->comps[ZRAM_PRIMARY_COMP], zstrm); - zs_unmap_object(zram->mem_pool, handle); =20 zram_slot_write_lock(zram, index); - zram_set_handle(zram, index, handle); + zram_set_handle(zram, index, hm.handle); zram_set_obj_size(zram, index, comp_len); zram_slot_write_unlock(zram, index); =20 @@ -1875,14 +1883,14 @@ static int recompress_slot(struct zram *zram, u32 i= ndex, struct page *page, u32 prio_max) { struct zcomp_strm *zstrm =3D NULL; + struct zs_handle_mapping hm; unsigned long handle_old; - unsigned long handle_new; unsigned int comp_len_old; unsigned int comp_len_new; unsigned int class_index_old; unsigned int class_index_new; u32 num_recomps =3D 0; - void *src, *dst; + void *src; int ret; =20 handle_old =3D zram_get_handle(zram, index); @@ -2000,34 +2008,35 @@ static int recompress_slot(struct zram *zram, u32 i= ndex, struct page *page, =20 /* zsmalloc handle allocation can schedule, unlock slot's bucket */ zram_slot_write_unlock(zram, index); - handle_new =3D zs_malloc(zram->mem_pool, comp_len_new, - GFP_NOIO | __GFP_HIGHMEM | __GFP_MOVABLE); + hm.handle =3D zs_malloc(zram->mem_pool, comp_len_new, + GFP_NOIO | __GFP_HIGHMEM | __GFP_MOVABLE); zram_slot_write_lock(zram, index); =20 /* * If we couldn't allocate memory for recompressed object then bail * out and simply keep the old (existing) object in mempool. */ - if (IS_ERR_VALUE(handle_new)) { + if (IS_ERR_VALUE(hm.handle)) { zcomp_stream_put(zram->comps[prio], zstrm); - return PTR_ERR((void *)handle_new); + return PTR_ERR((void *)hm.handle); } =20 /* Slot has been modified concurrently */ if (!zram_test_flag(zram, index, ZRAM_PP_SLOT)) { zcomp_stream_put(zram->comps[prio], zstrm); - zs_free(zram->mem_pool, handle_new); + zs_free(zram->mem_pool, hm.handle); return 0; } =20 - dst =3D zs_map_object(zram->mem_pool, handle_new, ZS_MM_WO); - memcpy(dst, zstrm->buffer, comp_len_new); + hm.mode =3D ZS_MM_WO; + hm.local_copy =3D zstrm->handle_mem_copy; + zs_map_handle(zram->mem_pool, &hm); + memcpy(hm.handle_mem, zstrm->buffer, comp_len_new); + zs_unmap_handle(zram->mem_pool, &hm); zcomp_stream_put(zram->comps[prio], zstrm); =20 - zs_unmap_object(zram->mem_pool, handle_new); - zram_free_page(zram, index); - zram_set_handle(zram, index, handle_new); + zram_set_handle(zram, index, hm.handle); zram_set_obj_size(zram, index, comp_len_new); zram_set_priority(zram, index, prio); =20 --=20 2.48.1.262.g85cc9f2d1e-goog