From nobody Fri Jan 31 00:16:35 2025 Received: from mail-pl1-f170.google.com (mail-pl1-f170.google.com [209.85.214.170]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CF0EF1FCFF5 for ; Mon, 27 Jan 2025 07:29:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.170 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737962988; cv=none; b=i60zPvdH9u9GDwBxZteuUJZYQAlXAUPCVPg5zWJ8fvKClRFuDo9v3OpvbtLzokeB5CtY4VmhJgQIQJb0KJt1pctn+IjxW5T/2aTGxYWxtSQLyx/0gIaBjAUrp3WeE8f0R3u0X7IYcpTdPBoK/G0P40oidNLb7Z/DAD4wKuNe6gM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737962988; c=relaxed/simple; bh=EZa/W2dQAjJcOVTq1x/07GV7agrI/FKT6fi+vfFRaHg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=S4AkT+WnQpE3pKdM1q931Acps8ujwWwF+Bbu/5MmKwGsdtc0X+KCVUOeNPk/GXBWkQrhCxhaoIjX/3SuofLhZBGD+qT+7KRbnztbiunuAOxUv8dO/2uTLePDkJLI6h26p3v5uCqICAJ4UGgWRjBYlmhO7FO3tJ1tB0Jwn4KpRPg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=chromium.org; spf=pass smtp.mailfrom=chromium.org; dkim=pass (1024-bit key) header.d=chromium.org header.i=@chromium.org header.b=djTZfzzw; arc=none smtp.client-ip=209.85.214.170 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=chromium.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=chromium.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=chromium.org header.i=@chromium.org header.b="djTZfzzw" Received: by mail-pl1-f170.google.com with SMTP id d9443c01a7336-21619108a6bso67640815ad.3 for ; Sun, 26 Jan 2025 23:29:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1737962986; x=1738567786; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=q566LymSR3yLFFaT+LJFKvYojLsB2mtA0so6QzTzlYU=; b=djTZfzzw8+RGjXxwsuAZ1JdDhXNVGtUCGH2cGptEvIFF8slYR+D1ZRp+j6loZH/X3b 7IbwKSPtETYxiaL1Xx9Oj+3kGbrNOYq46Lonr+eKPBEZT16pvuA2lmK236XzNvN0zJrY n39YCzoB/zJU5pX85BUl5qIPvb3DfKDTE8IiU= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737962986; x=1738567786; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=q566LymSR3yLFFaT+LJFKvYojLsB2mtA0so6QzTzlYU=; b=w1yVm+OJDepQIV09ZdkaRlDT0E9SbAfAeQvk18Lf6FEYec7kINFJJNRGCfRv30GDE/ LxlOqUTvaoNn5Nngf1usHfO2AcC09WrbpK9CZie5X6HNALCrmiAksi8Oid79kQyzFvdi tmiW6jdg/3FANuOMuljsCamASM+WCwAtNpoyZQ8nPxr/ImDKSpQikr+I5f0Zjuzm/UJS k+kE6F75kHFTdxnlEzgaMrWhXocVjP3Bb1rJHATNGvUO23MlmTd/kJDFmx7I4YnhOtQ1 uig9edUF44x+Dq/OYFIjgSwdKECdjutyPSj9S1AXatDD1zO/kWlWrFfa+lOk4hgBtbXM wUUQ== X-Forwarded-Encrypted: i=1; AJvYcCV86+1BIco20yBHCEEDtkPwIXq6Tb7sP+Sc1wmD+LzuDs3M1V2p8cNQA8OiRLG1nFsfoOarPzNZbAHgA+s=@vger.kernel.org X-Gm-Message-State: AOJu0YxrPkt9Rn+xTITX6xqit3k5FQMr4VbOYwhzc1FGKPh67hsjtS9+ USd7kiOp/6Kg57NCS/DjxxmGsVSVmHnIKd5nJq1rpS/FM/dfuv8NtjDrCgKIxQ== X-Gm-Gg: ASbGncudtZLyPM976XfkTEzIVSsGt6qQ5MUW9s75tNpS2fR72nVAqO4V8dAiP521Ebb enpoX4bO/ZmL9/r4e1sRXpIKP7KXUJaYW3psadPVinJK1POqMNf0AI8utGxb1PMTRlD83a5Y8x4 1W5x8WSidvwKPI9m+VqMkm6JZd+C7fE+DsXROuY2gsYz7Mfx3E+xTnDER9UCxXZJ2I9iJ4Srzvq DmnbjlSfirlUjEZ9o+zkBIHTfVsDuMaJZfPlQvxF8D+8ZtZBhPnv3OgykswH/n0HH9U58YI7D0+ 5CrH6/s= X-Google-Smtp-Source: AGHT+IHWGcaKLW0ar/9axezARL9DLtAwfcGy5WchQ6brciU9TbvtbSpVRLYsS13AacDLWO3sj1eyFw== X-Received: by 2002:a05:6a00:8013:b0:725:8c0f:6fa3 with SMTP id d2e1a72fcca58-72dafbaae38mr52843068b3a.22.1737962986053; Sun, 26 Jan 2025 23:29:46 -0800 (PST) Received: from localhost ([2401:fa00:8f:203:566d:6152:c049:8d3a]) by smtp.gmail.com with UTF8SMTPSA id d2e1a72fcca58-72f8a69f41esm6319379b3a.31.2025.01.26.23.29.44 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sun, 26 Jan 2025 23:29:45 -0800 (PST) From: Sergey Senozhatsky To: Andrew Morton , Minchan Kim Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Sergey Senozhatsky Subject: [PATCHv2 1/9] zram: switch to non-atomic entry locking Date: Mon, 27 Jan 2025 16:29:12 +0900 Message-ID: <20250127072932.1289973-2-senozhatsky@chromium.org> X-Mailer: git-send-email 2.48.1.262.g85cc9f2d1e-goog In-Reply-To: <20250127072932.1289973-1-senozhatsky@chromium.org> References: <20250127072932.1289973-1-senozhatsky@chromium.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Concurrent modifications of meta table entries is now handled by per-entry spin-lock. This has a number of shortcomings. First, this imposes atomic requirements on compression backends. zram can call both zcomp_compress() and zcomp_decompress() under entry spin-lock, which implies that we can use only compression algorithms that don't schedule/sleep/wait during compression and decompression. This, for instance, makes it impossible to use some of the ASYNC compression algorithms (H/W compression, etc.) implementations. Second, this can potentially trigger watchdogs. For example, entry re-compression with secondary algorithms is performed under entry spin-lock. Given that we chain secondary compression algorithms and that some of them can be configured for best compression ratio (and worst compression speed) zram can stay under spin-lock for quite some time. Do not use per-entry spin-locks and instead convert it to an atomic_t variable which open codes reader-writer type of lock. This permits preemption from slot_lock section. Signed-off-by: Sergey Senozhatsky --- drivers/block/zram/zram_drv.c | 155 +++++++++++++++++++++------------- drivers/block/zram/zram_drv.h | 7 +- 2 files changed, 98 insertions(+), 64 deletions(-) diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c index 9f5020b077c5..14859bd2611f 100644 --- a/drivers/block/zram/zram_drv.c +++ b/drivers/block/zram/zram_drv.c @@ -58,19 +58,57 @@ static void zram_free_page(struct zram *zram, size_t in= dex); static int zram_read_from_zspool(struct zram *zram, struct page *page, u32 index); =20 -static int zram_slot_trylock(struct zram *zram, u32 index) +static int zram_slot_write_trylock(struct zram *zram, u32 index) { - return spin_trylock(&zram->table[index].lock); + int old; + + old =3D atomic_cmpxchg(&zram->table[index].lock, ZRAM_ENTRY_UNLOCKED, + ZRAM_ENTRY_WRLOCKED); + return old =3D=3D ZRAM_ENTRY_UNLOCKED; } =20 -static void zram_slot_lock(struct zram *zram, u32 index) +static void zram_slot_write_lock(struct zram *zram, u32 index) { - spin_lock(&zram->table[index].lock); + atomic_t *lock =3D &zram->table[index].lock; + int old; + + while (1) { + old =3D atomic_cmpxchg(lock, ZRAM_ENTRY_UNLOCKED, + ZRAM_ENTRY_WRLOCKED); + if (old =3D=3D ZRAM_ENTRY_UNLOCKED) + return; + + cond_resched(); + } } =20 -static void zram_slot_unlock(struct zram *zram, u32 index) +static void zram_slot_write_unlock(struct zram *zram, u32 index) { - spin_unlock(&zram->table[index].lock); + atomic_set(&zram->table[index].lock, ZRAM_ENTRY_UNLOCKED); +} + +static void zram_slot_read_lock(struct zram *zram, u32 index) +{ + atomic_t *lock =3D &zram->table[index].lock; + int old; + + while (1) { + old =3D atomic_read(lock); + if (old =3D=3D ZRAM_ENTRY_WRLOCKED) { + cond_resched(); + continue; + } + + if (atomic_cmpxchg(lock, old, old + 1) =3D=3D old) + return; + + cond_resched(); + } +} + +static void zram_slot_read_unlock(struct zram *zram, u32 index) +{ + atomic_dec(&zram->table[index].lock); } =20 static inline bool init_done(struct zram *zram) @@ -93,7 +131,6 @@ static void zram_set_handle(struct zram *zram, u32 index= , unsigned long handle) zram->table[index].handle =3D handle; } =20 -/* flag operations require table entry bit_spin_lock() being held */ static bool zram_test_flag(struct zram *zram, u32 index, enum zram_pageflags flag) { @@ -229,9 +266,9 @@ static void release_pp_slot(struct zram *zram, struct z= ram_pp_slot *pps) { list_del_init(&pps->entry); =20 - zram_slot_lock(zram, pps->index); + zram_slot_write_lock(zram, pps->index); zram_clear_flag(zram, pps->index, ZRAM_PP_SLOT); - zram_slot_unlock(zram, pps->index); + zram_slot_write_unlock(zram, pps->index); =20 kfree(pps); } @@ -394,11 +431,11 @@ static void mark_idle(struct zram *zram, ktime_t cuto= ff) * * And ZRAM_WB slots simply cannot be ZRAM_IDLE. */ - zram_slot_lock(zram, index); + zram_slot_write_lock(zram, index); if (!zram_allocated(zram, index) || zram_test_flag(zram, index, ZRAM_WB) || zram_test_flag(zram, index, ZRAM_SAME)) { - zram_slot_unlock(zram, index); + zram_slot_write_unlock(zram, index); continue; } =20 @@ -410,7 +447,7 @@ static void mark_idle(struct zram *zram, ktime_t cutoff) zram_set_flag(zram, index, ZRAM_IDLE); else zram_clear_flag(zram, index, ZRAM_IDLE); - zram_slot_unlock(zram, index); + zram_slot_write_unlock(zram, index); } } =20 @@ -709,7 +746,7 @@ static int scan_slots_for_writeback(struct zram *zram, = u32 mode, =20 INIT_LIST_HEAD(&pps->entry); =20 - zram_slot_lock(zram, index); + zram_slot_write_lock(zram, index); if (!zram_allocated(zram, index)) goto next; =20 @@ -731,7 +768,7 @@ static int scan_slots_for_writeback(struct zram *zram, = u32 mode, place_pp_slot(zram, ctl, pps); pps =3D NULL; next: - zram_slot_unlock(zram, index); + zram_slot_write_unlock(zram, index); } =20 kfree(pps); @@ -822,7 +859,7 @@ static ssize_t writeback_store(struct device *dev, } =20 index =3D pps->index; - zram_slot_lock(zram, index); + zram_slot_read_lock(zram, index); /* * scan_slots() sets ZRAM_PP_SLOT and relases slot lock, so * slots can change in the meantime. If slots are accessed or @@ -833,7 +870,7 @@ static ssize_t writeback_store(struct device *dev, goto next; if (zram_read_from_zspool(zram, page, index)) goto next; - zram_slot_unlock(zram, index); + zram_slot_read_unlock(zram, index); =20 bio_init(&bio, zram->bdev, &bio_vec, 1, REQ_OP_WRITE | REQ_SYNC); @@ -860,7 +897,7 @@ static ssize_t writeback_store(struct device *dev, } =20 atomic64_inc(&zram->stats.bd_writes); - zram_slot_lock(zram, index); + zram_slot_write_lock(zram, index); /* * Same as above, we release slot lock during writeback so * slot can change under us: slot_free() or slot_free() and @@ -882,7 +919,7 @@ static ssize_t writeback_store(struct device *dev, zram->bd_wb_limit -=3D 1UL << (PAGE_SHIFT - 12); spin_unlock(&zram->wb_limit_lock); next: - zram_slot_unlock(zram, index); + zram_slot_write_unlock(zram, index); release_pp_slot(zram, pps); =20 cond_resched(); @@ -1001,7 +1038,7 @@ static ssize_t read_block_state(struct file *file, ch= ar __user *buf, for (index =3D *ppos; index < nr_pages; index++) { int copied; =20 - zram_slot_lock(zram, index); + zram_slot_read_lock(zram, index); if (!zram_allocated(zram, index)) goto next; =20 @@ -1019,13 +1056,13 @@ static ssize_t read_block_state(struct file *file, = char __user *buf, ZRAM_INCOMPRESSIBLE) ? 'n' : '.'); =20 if (count <=3D copied) { - zram_slot_unlock(zram, index); + zram_slot_read_unlock(zram, index); break; } written +=3D copied; count -=3D copied; next: - zram_slot_unlock(zram, index); + zram_slot_read_unlock(zram, index); *ppos +=3D 1; } =20 @@ -1455,33 +1492,31 @@ static void zram_meta_free(struct zram *zram, u64 d= isksize) =20 static bool zram_meta_alloc(struct zram *zram, u64 disksize) { - size_t num_pages, index; + size_t num_ents, index; =20 - num_pages =3D disksize >> PAGE_SHIFT; - zram->table =3D vzalloc(array_size(num_pages, sizeof(*zram->table))); + num_ents =3D disksize >> PAGE_SHIFT; + zram->table =3D vzalloc(array_size(num_ents, sizeof(*zram->table))); if (!zram->table) - return false; + goto error; =20 zram->mem_pool =3D zs_create_pool(zram->disk->disk_name); - if (!zram->mem_pool) { - vfree(zram->table); - zram->table =3D NULL; - return false; - } + if (!zram->mem_pool) + goto error; =20 if (!huge_class_size) huge_class_size =3D zs_huge_class_size(zram->mem_pool); =20 - for (index =3D 0; index < num_pages; index++) - spin_lock_init(&zram->table[index].lock); + for (index =3D 0; index < num_ents; index++) + atomic_set(&zram->table[index].lock, ZRAM_ENTRY_UNLOCKED); + return true; + +error: + vfree(zram->table); + zram->table =3D NULL; + return false; } =20 -/* - * To protect concurrent access to the same index entry, - * caller should hold this table index entry's bit_spinlock to - * indicate this index entry is accessing. - */ static void zram_free_page(struct zram *zram, size_t index) { unsigned long handle; @@ -1602,17 +1637,17 @@ static int zram_read_page(struct zram *zram, struct= page *page, u32 index, { int ret; =20 - zram_slot_lock(zram, index); + zram_slot_read_lock(zram, index); if (!zram_test_flag(zram, index, ZRAM_WB)) { /* Slot should be locked through out the function call */ ret =3D zram_read_from_zspool(zram, page, index); - zram_slot_unlock(zram, index); + zram_slot_read_unlock(zram, index); } else { /* * The slot should be unlocked before reading from the backing * device. */ - zram_slot_unlock(zram, index); + zram_slot_read_unlock(zram, index); =20 ret =3D read_from_bdev(zram, page, zram_get_handle(zram, index), parent); @@ -1655,10 +1690,10 @@ static int zram_bvec_read(struct zram *zram, struct= bio_vec *bvec, static int write_same_filled_page(struct zram *zram, unsigned long fill, u32 index) { - zram_slot_lock(zram, index); + zram_slot_write_lock(zram, index); zram_set_flag(zram, index, ZRAM_SAME); zram_set_handle(zram, index, fill); - zram_slot_unlock(zram, index); + zram_slot_write_unlock(zram, index); =20 atomic64_inc(&zram->stats.same_pages); atomic64_inc(&zram->stats.pages_stored); @@ -1693,11 +1728,11 @@ static int write_incompressible_page(struct zram *z= ram, struct page *page, kunmap_local(src); zs_unmap_object(zram->mem_pool, handle); =20 - zram_slot_lock(zram, index); + zram_slot_write_lock(zram, index); zram_set_flag(zram, index, ZRAM_HUGE); zram_set_handle(zram, index, handle); zram_set_obj_size(zram, index, PAGE_SIZE); - zram_slot_unlock(zram, index); + zram_slot_write_unlock(zram, index); =20 atomic64_add(PAGE_SIZE, &zram->stats.compr_data_size); atomic64_inc(&zram->stats.huge_pages); @@ -1718,9 +1753,9 @@ static int zram_write_page(struct zram *zram, struct = page *page, u32 index) bool same_filled; =20 /* First, free memory allocated to this slot (if any) */ - zram_slot_lock(zram, index); + zram_slot_write_lock(zram, index); zram_free_page(zram, index); - zram_slot_unlock(zram, index); + zram_slot_write_unlock(zram, index); =20 mem =3D kmap_local_page(page); same_filled =3D page_same_filled(mem, &element); @@ -1790,10 +1825,10 @@ static int zram_write_page(struct zram *zram, struc= t page *page, u32 index) zcomp_stream_put(zram->comps[ZRAM_PRIMARY_COMP]); zs_unmap_object(zram->mem_pool, handle); =20 - zram_slot_lock(zram, index); + zram_slot_write_lock(zram, index); zram_set_handle(zram, index, handle); zram_set_obj_size(zram, index, comp_len); - zram_slot_unlock(zram, index); + zram_slot_write_unlock(zram, index); =20 /* Update stats */ atomic64_inc(&zram->stats.pages_stored); @@ -1850,7 +1885,7 @@ static int scan_slots_for_recompress(struct zram *zra= m, u32 mode, =20 INIT_LIST_HEAD(&pps->entry); =20 - zram_slot_lock(zram, index); + zram_slot_write_lock(zram, index); if (!zram_allocated(zram, index)) goto next; =20 @@ -1871,7 +1906,7 @@ static int scan_slots_for_recompress(struct zram *zra= m, u32 mode, place_pp_slot(zram, ctl, pps); pps =3D NULL; next: - zram_slot_unlock(zram, index); + zram_slot_write_unlock(zram, index); } =20 kfree(pps); @@ -2162,7 +2197,7 @@ static ssize_t recompress_store(struct device *dev, if (!num_recomp_pages) break; =20 - zram_slot_lock(zram, pps->index); + zram_slot_write_lock(zram, pps->index); if (!zram_test_flag(zram, pps->index, ZRAM_PP_SLOT)) goto next; =20 @@ -2170,7 +2205,7 @@ static ssize_t recompress_store(struct device *dev, &num_recomp_pages, threshold, prio, prio_max); next: - zram_slot_unlock(zram, pps->index); + zram_slot_write_unlock(zram, pps->index); release_pp_slot(zram, pps); =20 if (err) { @@ -2217,9 +2252,9 @@ static void zram_bio_discard(struct zram *zram, struc= t bio *bio) } =20 while (n >=3D PAGE_SIZE) { - zram_slot_lock(zram, index); + zram_slot_write_lock(zram, index); zram_free_page(zram, index); - zram_slot_unlock(zram, index); + zram_slot_write_unlock(zram, index); atomic64_inc(&zram->stats.notify_free); index++; n -=3D PAGE_SIZE; @@ -2248,9 +2283,9 @@ static void zram_bio_read(struct zram *zram, struct b= io *bio) } flush_dcache_page(bv.bv_page); =20 - zram_slot_lock(zram, index); + zram_slot_write_lock(zram, index); zram_accessed(zram, index); - zram_slot_unlock(zram, index); + zram_slot_write_unlock(zram, index); =20 bio_advance_iter_single(bio, &iter, bv.bv_len); } while (iter.bi_size); @@ -2278,9 +2313,9 @@ static void zram_bio_write(struct zram *zram, struct = bio *bio) break; } =20 - zram_slot_lock(zram, index); + zram_slot_write_lock(zram, index); zram_accessed(zram, index); - zram_slot_unlock(zram, index); + zram_slot_write_unlock(zram, index); =20 bio_advance_iter_single(bio, &iter, bv.bv_len); } while (iter.bi_size); @@ -2321,13 +2356,13 @@ static void zram_slot_free_notify(struct block_devi= ce *bdev, zram =3D bdev->bd_disk->private_data; =20 atomic64_inc(&zram->stats.notify_free); - if (!zram_slot_trylock(zram, index)) { + if (!zram_slot_write_trylock(zram, index)) { atomic64_inc(&zram->stats.miss_free); return; } =20 zram_free_page(zram, index); - zram_slot_unlock(zram, index); + zram_slot_write_unlock(zram, index); } =20 static void zram_comp_params_reset(struct zram *zram) diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h index db78d7c01b9a..3436ddf8ab23 100644 --- a/drivers/block/zram/zram_drv.h +++ b/drivers/block/zram/zram_drv.h @@ -15,7 +15,6 @@ #ifndef _ZRAM_DRV_H_ #define _ZRAM_DRV_H_ =20 -#include #include #include =20 @@ -28,7 +27,6 @@ #define ZRAM_SECTOR_PER_LOGICAL_BLOCK \ (1 << (ZRAM_LOGICAL_BLOCK_SHIFT - SECTOR_SHIFT)) =20 - /* * ZRAM is mainly used for memory efficiency so we want to keep memory * footprint small and thus squeeze size and zram pageflags into a flags @@ -58,13 +56,14 @@ enum zram_pageflags { __NR_ZRAM_PAGEFLAGS, }; =20 -/*-- Data structures */ +#define ZRAM_ENTRY_UNLOCKED 0 +#define ZRAM_ENTRY_WRLOCKED (-1) =20 /* Allocated for each disk page */ struct zram_table_entry { unsigned long handle; unsigned int flags; - spinlock_t lock; + atomic_t lock; #ifdef CONFIG_ZRAM_TRACK_ENTRY_ACTIME ktime_t ac_time; #endif --=20 2.48.1.262.g85cc9f2d1e-goog