From nobody Fri Dec 19 19:20:17 2025 Received: from mail-ed1-f48.google.com (mail-ed1-f48.google.com [209.85.208.48]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E237F1A7262 for ; Thu, 5 Jun 2025 10:38:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.208.48 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749119940; cv=none; b=sfle+vaZsuno92IRnyJQWuso4D+iP8EOos3CB5FyhrJNcPxi1lRIcpNo70xs6zlbQD6f/ssUYhvEwHTEl7rQg6ODUQ1KS4SbaKzQgfn/TsfM1K3mPZScxEP2rCSUdZkoGzZ5aeK3eqTRwAkxrXQh70CBXHOEnULvuokjXokh9rk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749119940; c=relaxed/simple; bh=2WAs0lhlVnhb9ip5CzlONPzV0WTDB1mdue6VeXSYEAE=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version:Content-Type; b=Rg1PTfQsc4H2XhW8IfAVdbC/ZkE2/4/TsHVRafiHCkJl4zItCJsBo0lFUI0PZ7z0ZIcB8gu6/lPYY1QaPHquFWTWtBMSLX1wWTJa/xHbj9ztuPSTVb8l0txni/65Srw+OyGoTajyssdJFyNpYEENXjqK/a5x+fYmUOFTzE0iJ6c= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linbit.com; spf=pass smtp.mailfrom=linbit.com; dkim=pass (2048-bit key) header.d=linbit-com.20230601.gappssmtp.com header.i=@linbit-com.20230601.gappssmtp.com header.b=MGvWbbmE; arc=none smtp.client-ip=209.85.208.48 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linbit.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linbit.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linbit-com.20230601.gappssmtp.com header.i=@linbit-com.20230601.gappssmtp.com header.b="MGvWbbmE" Received: by mail-ed1-f48.google.com with SMTP id 4fb4d7f45d1cf-60461fc88d7so1579234a12.0 for ; Thu, 05 Jun 2025 03:38:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linbit-com.20230601.gappssmtp.com; s=20230601; t=1749119935; x=1749724735; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=L1VKVgPyDTD7FcC13I4CKc8IjBxTy3Bp73Mc+aF8JhY=; b=MGvWbbmEXV9m3Pmexet4U8AcUCe0XzuYYPjrXT+vFKoU0QS1VfmsjG3au78Oalp7oq 6fm2K7V9hDb9jkZhvdhyg+qw2W1ZUHbc2A43DSy5+S1S+iJ6+Vu8K0xtsJQQ3PKXw4Vm vSFB3B3D83JrMkIT+BKjHs18w+eno6ry7A3gcb6Oh6RSljmRVCShrTWgi7sRorQkHyB2 2upEq+H+5/Ibw/WdvrQ9aCPKrn/JuJ3cTQX149y58OMUeYhBCyNPIQBD62hblGwZMFzF Va9+vDbkKMtXrtas5PWiRtkFFSieiF813J5n+Q5kxECwOp1VvVkY9G3UM54pMFMrUO+7 AjCw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1749119935; x=1749724735; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=L1VKVgPyDTD7FcC13I4CKc8IjBxTy3Bp73Mc+aF8JhY=; b=soVSLhAY4tynRstu2EJIgUWANnp9mT19JAS1cjrUTMOhyv2wUd+oemjhfbPymrtJnT xnR2l41VUP3f8IwdY8ucFSs2bb4c62SS3pj1hsrHiJmorl9qiO7tkg0ZKhoa3u3hRasa wXdhT3r6C9gg6ebFl5Pf/fGKRf1Kk3yzjFdEkngXsEqT3ZbK0TL8R/mZlk34o/g5RQX8 3NjW9D/A5CIh/k27xiBie7ZvqYxfRXY1nEI/JmQofnbCly43kFui/MDHZvfrfPaGT382 nqTw5a7Tn/j3FdWaEuQ5Wce8fTIuOHDw3JJRMtw3C49NMzTgOskbBSQ+3ysbTfg4JaLG xsTw== X-Forwarded-Encrypted: i=1; AJvYcCVwYlgJWQB7+PkgogOdz2VH84LXRHYBBMwhQuCniKOveyJjPXkto+DnlN0+114n3ta7BdMURPIRAEnFgAY=@vger.kernel.org X-Gm-Message-State: AOJu0Yy/9FxluA+VP85RST96tff2fr5NDpBtKlOxSHWEw3Uzx04c1xXJ FDx0iNy/OUtz8UCbYFSvT5tEPPCrkDlmXeg7CpRpxraX0+YgXWc0A6A78SYUFqwNzUXyT/Ke91y uovDv X-Gm-Gg: ASbGncss4L9StAfNZuGohxcvycnjueYayck9JQf0loOUqVN9YpnogbaVe9d5S2XaEZB MbcYTM4Syd/KnAbB+C7ghff03WOcb1bQUfnSle5xFy68nIG+E9qLyOBZXFz/D9+DM7oSLcRxU2s wmUOWq0Pg0V0Z0iaONgP5XgejT9Eb/NL5+pXzjgXrJxr9RZHPYhYYp6eYMxNEjSmNELV/tERVJf H7CT1HbT6bET8StHWYFmdB6OIWNd0jPjtOABijB/7jGRNe5mH4TIMWIVMBRnCItsQ+pmojL1x09 Z7PcUySigw6YoWcWE0Ycc3qvBOIt5+PDGZAiSnYpcYLPrQoYc6tyaOOzF08Vohabixy0z0jKEbL N6z9YCCtotC5B9CAcqS6LD5kEvIypvRmEhNk= X-Google-Smtp-Source: AGHT+IGclUS1k1su+f50C6wj0i365j7X82QoNxAP8m/+UNr7j/+6qGJjOKZSDyUiJpKR/oUHOHZRew== X-Received: by 2002:a17:907:9688:b0:ad5:74cd:1824 with SMTP id a640c23a62f3a-addf8e3ecfbmr553565566b.38.1749119934953; Thu, 05 Jun 2025 03:38:54 -0700 (PDT) Received: from localhost.localdomain (h082218028181.host.wavenet.at. [82.218.28.181]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-ada6ad6ac6bsm1234670866b.169.2025.06.05.03.38.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 05 Jun 2025 03:38:54 -0700 (PDT) From: =?UTF-8?q?Christoph=20B=C3=B6hmwalder?= To: Jens Axboe Cc: drbd-dev@lists.linbit.com, linux-kernel@vger.kernel.org, Lars Ellenberg , Philipp Reisner , linux-block@vger.kernel.org, =?UTF-8?q?Christoph=20B=C3=B6hmwalder?= Subject: [PATCH] drbd: Remove the open-coded page pool Date: Thu, 5 Jun 2025 12:38:52 +0200 Message-ID: <20250605103852.23029-1-christoph.boehmwalder@linbit.com> X-Mailer: git-send-email 2.49.0 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable From: Philipp Reisner If the network stack keeps a reference for too long, DRBD keeps references on a higher number of pages as a consequence. Fix all that by no longer relying on page reference counts dropping to an expected value. Instead, DRBD gives up its reference and lets the system handle everything else. While at it, remove the open-coded custom page pool mechanism and use the page_pool included in the kernel. Signed-off-by: Philipp Reisner Signed-off-by: Christoph B=C3=B6hmwalder Tested-by: Eric Hagberg --- drivers/block/drbd/drbd_int.h | 39 +---- drivers/block/drbd/drbd_main.c | 59 ++----- drivers/block/drbd/drbd_receiver.c | 264 ++++------------------------- drivers/block/drbd/drbd_worker.c | 56 ++---- 4 files changed, 71 insertions(+), 347 deletions(-) diff --git a/drivers/block/drbd/drbd_int.h b/drivers/block/drbd/drbd_int.h index e21492981f7d..f6d6276974ee 100644 --- a/drivers/block/drbd/drbd_int.h +++ b/drivers/block/drbd/drbd_int.h @@ -380,6 +380,9 @@ enum { /* this is/was a write request */ __EE_WRITE, =20 + /* hand back using mempool_free(e, drbd_buffer_page_pool) */ + __EE_RELEASE_TO_MEMPOOL, + /* this is/was a write same request */ __EE_WRITE_SAME, =20 @@ -402,6 +405,7 @@ enum { #define EE_IN_INTERVAL_TREE (1<<__EE_IN_INTERVAL_TREE) #define EE_SUBMITTED (1<<__EE_SUBMITTED) #define EE_WRITE (1<<__EE_WRITE) +#define EE_RELEASE_TO_MEMPOOL (1<<__EE_RELEASE_TO_MEMPOOL) #define EE_WRITE_SAME (1<<__EE_WRITE_SAME) #define EE_APPLICATION (1<<__EE_APPLICATION) #define EE_RS_THIN_REQ (1<<__EE_RS_THIN_REQ) @@ -858,7 +862,6 @@ struct drbd_device { struct list_head sync_ee; /* IO in progress (P_RS_DATA_REPLY gets writt= en to disk) */ struct list_head done_ee; /* need to send P_WRITE_ACK */ struct list_head read_ee; /* [RS]P_DATA_REQUEST being read */ - struct list_head net_ee; /* zero-copy network send in progress */ =20 struct list_head resync_reads; atomic_t pp_in_use; /* allocated from page pool */ @@ -1329,24 +1332,6 @@ extern struct kmem_cache *drbd_al_ext_cache; /* acti= vity log extents */ extern mempool_t drbd_request_mempool; extern mempool_t drbd_ee_mempool; =20 -/* drbd's page pool, used to buffer data received from the peer, - * or data requested by the peer. - * - * This does not have an emergency reserve. - * - * When allocating from this pool, it first takes pages from the pool. - * Only if the pool is depleted will try to allocate from the system. - * - * The assumption is that pages taken from this pool will be processed, - * and given back, "quickly", and then can be recycled, so we can avoid - * frequent calls to alloc_page(), and still will be able to make progress= even - * under memory pressure. - */ -extern struct page *drbd_pp_pool; -extern spinlock_t drbd_pp_lock; -extern int drbd_pp_vacant; -extern wait_queue_head_t drbd_pp_wait; - /* We also need a standard (emergency-reserve backed) page pool * for meta data IO (activity log, bitmap). * We can keep it global, as long as it is used as "N pages at a time". @@ -1354,6 +1339,7 @@ extern wait_queue_head_t drbd_pp_wait; */ #define DRBD_MIN_POOL_PAGES 128 extern mempool_t drbd_md_io_page_pool; +extern mempool_t drbd_buffer_page_pool; =20 /* We also need to make sure we get a bio * when we need it for housekeeping purposes */ @@ -1488,10 +1474,7 @@ extern struct drbd_peer_request *drbd_alloc_peer_req= (struct drbd_peer_device *, sector_t, unsigned int, unsigned int, gfp_t) __must_hold(local); -extern void __drbd_free_peer_req(struct drbd_device *, struct drbd_peer_re= quest *, - int); -#define drbd_free_peer_req(m,e) __drbd_free_peer_req(m, e, 0) -#define drbd_free_net_peer_req(m,e) __drbd_free_peer_req(m, e, 1) +extern void drbd_free_peer_req(struct drbd_device *device, struct drbd_pee= r_request *req); extern struct page *drbd_alloc_pages(struct drbd_peer_device *, unsigned i= nt, bool); extern void _drbd_clear_done_ee(struct drbd_device *device, struct list_he= ad *to_be_freed); extern int drbd_connected(struct drbd_peer_device *); @@ -1610,16 +1593,6 @@ static inline struct page *page_chain_next(struct pa= ge *page) for (; page && ({ n =3D page_chain_next(page); 1; }); page =3D n) =20 =20 -static inline int drbd_peer_req_has_active_page(struct drbd_peer_request *= peer_req) -{ - struct page *page =3D peer_req->pages; - page_chain_for_each(page) { - if (page_count(page) > 1) - return 1; - } - return 0; -} - static inline union drbd_state drbd_read_state(struct drbd_device *device) { struct drbd_resource *resource =3D device->resource; diff --git a/drivers/block/drbd/drbd_main.c b/drivers/block/drbd/drbd_main.c index ced2cc5f46f2..96ab63896e90 100644 --- a/drivers/block/drbd/drbd_main.c +++ b/drivers/block/drbd/drbd_main.c @@ -114,20 +114,10 @@ struct kmem_cache *drbd_al_ext_cache; /* activity log= extents */ mempool_t drbd_request_mempool; mempool_t drbd_ee_mempool; mempool_t drbd_md_io_page_pool; +mempool_t drbd_buffer_page_pool; struct bio_set drbd_md_io_bio_set; struct bio_set drbd_io_bio_set; =20 -/* I do not use a standard mempool, because: - 1) I want to hand out the pre-allocated objects first. - 2) I want to be able to interrupt sleeping allocation with a signal. - Note: This is a single linked list, the next pointer is the private - member of struct page. - */ -struct page *drbd_pp_pool; -DEFINE_SPINLOCK(drbd_pp_lock); -int drbd_pp_vacant; -wait_queue_head_t drbd_pp_wait; - DEFINE_RATELIMIT_STATE(drbd_ratelimit_state, 5 * HZ, 5); =20 static const struct block_device_operations drbd_ops =3D { @@ -1611,6 +1601,7 @@ static int _drbd_send_zc_bio(struct drbd_peer_device = *peer_device, struct bio *b static int _drbd_send_zc_ee(struct drbd_peer_device *peer_device, struct drbd_peer_request *peer_req) { + bool use_sendpage =3D !(peer_req->flags & EE_RELEASE_TO_MEMPOOL); struct page *page =3D peer_req->pages; unsigned len =3D peer_req->i.size; int err; @@ -1619,8 +1610,13 @@ static int _drbd_send_zc_ee(struct drbd_peer_device = *peer_device, page_chain_for_each(page) { unsigned l =3D min_t(unsigned, len, PAGE_SIZE); =20 - err =3D _drbd_send_page(peer_device, page, 0, l, - page_chain_next(page) ? MSG_MORE : 0); + if (likely(use_sendpage)) + err =3D _drbd_send_page(peer_device, page, 0, l, + page_chain_next(page) ? MSG_MORE : 0); + else + err =3D _drbd_no_send_page(peer_device, page, 0, l, + page_chain_next(page) ? MSG_MORE : 0); + if (err) return err; len -=3D l; @@ -1962,7 +1958,6 @@ void drbd_init_set_defaults(struct drbd_device *devic= e) INIT_LIST_HEAD(&device->sync_ee); INIT_LIST_HEAD(&device->done_ee); INIT_LIST_HEAD(&device->read_ee); - INIT_LIST_HEAD(&device->net_ee); INIT_LIST_HEAD(&device->resync_reads); INIT_LIST_HEAD(&device->resync_work.list); INIT_LIST_HEAD(&device->unplug_work.list); @@ -2043,7 +2038,6 @@ void drbd_device_cleanup(struct drbd_device *device) D_ASSERT(device, list_empty(&device->sync_ee)); D_ASSERT(device, list_empty(&device->done_ee)); D_ASSERT(device, list_empty(&device->read_ee)); - D_ASSERT(device, list_empty(&device->net_ee)); D_ASSERT(device, list_empty(&device->resync_reads)); D_ASSERT(device, list_empty(&first_peer_device(device)->connection->sende= r_work.q)); D_ASSERT(device, list_empty(&device->resync_work.list)); @@ -2055,19 +2049,11 @@ void drbd_device_cleanup(struct drbd_device *device) =20 static void drbd_destroy_mempools(void) { - struct page *page; - - while (drbd_pp_pool) { - page =3D drbd_pp_pool; - drbd_pp_pool =3D (struct page *)page_private(page); - __free_page(page); - drbd_pp_vacant--; - } - /* D_ASSERT(device, atomic_read(&drbd_pp_vacant)=3D=3D0); */ =20 bioset_exit(&drbd_io_bio_set); bioset_exit(&drbd_md_io_bio_set); + mempool_exit(&drbd_buffer_page_pool); mempool_exit(&drbd_md_io_page_pool); mempool_exit(&drbd_ee_mempool); mempool_exit(&drbd_request_mempool); @@ -2086,9 +2072,8 @@ static void drbd_destroy_mempools(void) =20 static int drbd_create_mempools(void) { - struct page *page; const int number =3D (DRBD_MAX_BIO_SIZE/PAGE_SIZE) * drbd_minor_count; - int i, ret; + int ret; =20 /* caches */ drbd_request_cache =3D kmem_cache_create( @@ -2125,6 +2110,10 @@ static int drbd_create_mempools(void) if (ret) goto Enomem; =20 + ret =3D mempool_init_page_pool(&drbd_buffer_page_pool, number, 0); + if (ret) + goto Enomem; + ret =3D mempool_init_slab_pool(&drbd_request_mempool, number, drbd_request_cache); if (ret) @@ -2134,15 +2123,6 @@ static int drbd_create_mempools(void) if (ret) goto Enomem; =20 - for (i =3D 0; i < number; i++) { - page =3D alloc_page(GFP_HIGHUSER); - if (!page) - goto Enomem; - set_page_private(page, (unsigned long)drbd_pp_pool); - drbd_pp_pool =3D page; - } - drbd_pp_vacant =3D number; - return 0; =20 Enomem: @@ -2169,10 +2149,6 @@ static void drbd_release_all_peer_reqs(struct drbd_d= evice *device) rr =3D drbd_free_peer_reqs(device, &device->done_ee); if (rr) drbd_err(device, "%d EEs in done list found!\n", rr); - - rr =3D drbd_free_peer_reqs(device, &device->net_ee); - if (rr) - drbd_err(device, "%d EEs in net list found!\n", rr); } =20 /* caution. no locking. */ @@ -2863,11 +2839,6 @@ static int __init drbd_init(void) return err; } =20 - /* - * allocate all necessary structs - */ - init_waitqueue_head(&drbd_pp_wait); - drbd_proc =3D NULL; /* play safe for drbd_cleanup */ idr_init(&drbd_devices); =20 diff --git a/drivers/block/drbd/drbd_receiver.c b/drivers/block/drbd/drbd_r= eceiver.c index e5a2e5f7887b..fe3d0521de0b 100644 --- a/drivers/block/drbd/drbd_receiver.c +++ b/drivers/block/drbd/drbd_receiver.c @@ -33,6 +33,7 @@ #include #include #include +#include #include "drbd_int.h" #include "drbd_protocol.h" #include "drbd_req.h" @@ -63,182 +64,31 @@ static int e_end_block(struct drbd_work *, int); =20 #define GFP_TRY (__GFP_HIGHMEM | __GFP_NOWARN) =20 -/* - * some helper functions to deal with single linked page lists, - * page->private being our "next" pointer. - */ - -/* If at least n pages are linked at head, get n pages off. - * Otherwise, don't modify head, and return NULL. - * Locking is the responsibility of the caller. - */ -static struct page *page_chain_del(struct page **head, int n) -{ - struct page *page; - struct page *tmp; - - BUG_ON(!n); - BUG_ON(!head); - - page =3D *head; - - if (!page) - return NULL; - - while (page) { - tmp =3D page_chain_next(page); - if (--n =3D=3D 0) - break; /* found sufficient pages */ - if (tmp =3D=3D NULL) - /* insufficient pages, don't use any of them. */ - return NULL; - page =3D tmp; - } - - /* add end of list marker for the returned list */ - set_page_private(page, 0); - /* actual return value, and adjustment of head */ - page =3D *head; - *head =3D tmp; - return page; -} - -/* may be used outside of locks to find the tail of a (usually short) - * "private" page chain, before adding it back to a global chain head - * with page_chain_add() under a spinlock. */ -static struct page *page_chain_tail(struct page *page, int *len) -{ - struct page *tmp; - int i =3D 1; - while ((tmp =3D page_chain_next(page))) { - ++i; - page =3D tmp; - } - if (len) - *len =3D i; - return page; -} - -static int page_chain_free(struct page *page) -{ - struct page *tmp; - int i =3D 0; - page_chain_for_each_safe(page, tmp) { - put_page(page); - ++i; - } - return i; -} - -static void page_chain_add(struct page **head, - struct page *chain_first, struct page *chain_last) -{ -#if 1 - struct page *tmp; - tmp =3D page_chain_tail(chain_first, NULL); - BUG_ON(tmp !=3D chain_last); -#endif - - /* add chain to head */ - set_page_private(chain_last, (unsigned long)*head); - *head =3D chain_first; -} - -static struct page *__drbd_alloc_pages(struct drbd_device *device, - unsigned int number) +static struct page *__drbd_alloc_pages(unsigned int number) { struct page *page =3D NULL; struct page *tmp =3D NULL; unsigned int i =3D 0; =20 - /* Yes, testing drbd_pp_vacant outside the lock is racy. - * So what. It saves a spin_lock. */ - if (drbd_pp_vacant >=3D number) { - spin_lock(&drbd_pp_lock); - page =3D page_chain_del(&drbd_pp_pool, number); - if (page) - drbd_pp_vacant -=3D number; - spin_unlock(&drbd_pp_lock); - if (page) - return page; - } - /* GFP_TRY, because we must not cause arbitrary write-out: in a DRBD * "criss-cross" setup, that might cause write-out on some other DRBD, * which in turn might block on the other node at this very place. */ for (i =3D 0; i < number; i++) { - tmp =3D alloc_page(GFP_TRY); + tmp =3D mempool_alloc(&drbd_buffer_page_pool, GFP_TRY); if (!tmp) - break; + goto fail; set_page_private(tmp, (unsigned long)page); page =3D tmp; } - - if (i =3D=3D number) - return page; - - /* Not enough pages immediately available this time. - * No need to jump around here, drbd_alloc_pages will retry this - * function "soon". */ - if (page) { - tmp =3D page_chain_tail(page, NULL); - spin_lock(&drbd_pp_lock); - page_chain_add(&drbd_pp_pool, page, tmp); - drbd_pp_vacant +=3D i; - spin_unlock(&drbd_pp_lock); + return page; +fail: + page_chain_for_each_safe(page, tmp) { + set_page_private(page, 0); + mempool_free(page, &drbd_buffer_page_pool); } return NULL; } =20 -static void reclaim_finished_net_peer_reqs(struct drbd_device *device, - struct list_head *to_be_freed) -{ - struct drbd_peer_request *peer_req, *tmp; - - /* The EEs are always appended to the end of the list. Since - they are sent in order over the wire, they have to finish - in order. As soon as we see the first not finished we can - stop to examine the list... */ - - list_for_each_entry_safe(peer_req, tmp, &device->net_ee, w.list) { - if (drbd_peer_req_has_active_page(peer_req)) - break; - list_move(&peer_req->w.list, to_be_freed); - } -} - -static void drbd_reclaim_net_peer_reqs(struct drbd_device *device) -{ - LIST_HEAD(reclaimed); - struct drbd_peer_request *peer_req, *t; - - spin_lock_irq(&device->resource->req_lock); - reclaim_finished_net_peer_reqs(device, &reclaimed); - spin_unlock_irq(&device->resource->req_lock); - list_for_each_entry_safe(peer_req, t, &reclaimed, w.list) - drbd_free_net_peer_req(device, peer_req); -} - -static void conn_reclaim_net_peer_reqs(struct drbd_connection *connection) -{ - struct drbd_peer_device *peer_device; - int vnr; - - rcu_read_lock(); - idr_for_each_entry(&connection->peer_devices, peer_device, vnr) { - struct drbd_device *device =3D peer_device->device; - if (!atomic_read(&device->pp_in_use_by_net)) - continue; - - kref_get(&device->kref); - rcu_read_unlock(); - drbd_reclaim_net_peer_reqs(device); - kref_put(&device->kref, drbd_destroy_device); - rcu_read_lock(); - } - rcu_read_unlock(); -} - /** * drbd_alloc_pages() - Returns @number pages, retries forever (or until s= ignalled) * @peer_device: DRBD device. @@ -263,9 +113,8 @@ struct page *drbd_alloc_pages(struct drbd_peer_device *= peer_device, unsigned int bool retry) { struct drbd_device *device =3D peer_device->device; - struct page *page =3D NULL; + struct page *page; struct net_conf *nc; - DEFINE_WAIT(wait); unsigned int mxb; =20 rcu_read_lock(); @@ -273,37 +122,9 @@ struct page *drbd_alloc_pages(struct drbd_peer_device = *peer_device, unsigned int mxb =3D nc ? nc->max_buffers : 1000000; rcu_read_unlock(); =20 - if (atomic_read(&device->pp_in_use) < mxb) - page =3D __drbd_alloc_pages(device, number); - - /* Try to keep the fast path fast, but occasionally we need - * to reclaim the pages we lended to the network stack. */ - if (page && atomic_read(&device->pp_in_use_by_net) > 512) - drbd_reclaim_net_peer_reqs(device); - - while (page =3D=3D NULL) { - prepare_to_wait(&drbd_pp_wait, &wait, TASK_INTERRUPTIBLE); - - drbd_reclaim_net_peer_reqs(device); - - if (atomic_read(&device->pp_in_use) < mxb) { - page =3D __drbd_alloc_pages(device, number); - if (page) - break; - } - - if (!retry) - break; - - if (signal_pending(current)) { - drbd_warn(device, "drbd_alloc_pages interrupted!\n"); - break; - } - - if (schedule_timeout(HZ/10) =3D=3D 0) - mxb =3D UINT_MAX; - } - finish_wait(&drbd_pp_wait, &wait); + if (atomic_read(&device->pp_in_use) >=3D mxb) + schedule_timeout_interruptible(HZ / 10); + page =3D __drbd_alloc_pages(number); =20 if (page) atomic_add(number, &device->pp_in_use); @@ -314,29 +135,25 @@ struct page *drbd_alloc_pages(struct drbd_peer_device= *peer_device, unsigned int * Is also used from inside an other spin_lock_irq(&resource->req_lock); * Either links the page chain back to the global pool, * or returns all pages to the system. */ -static void drbd_free_pages(struct drbd_device *device, struct page *page,= int is_net) +static void drbd_free_pages(struct drbd_device *device, struct page *page) { - atomic_t *a =3D is_net ? &device->pp_in_use_by_net : &device->pp_in_use; - int i; + struct page *tmp; + int i =3D 0; =20 if (page =3D=3D NULL) return; =20 - if (drbd_pp_vacant > (DRBD_MAX_BIO_SIZE/PAGE_SIZE) * drbd_minor_count) - i =3D page_chain_free(page); - else { - struct page *tmp; - tmp =3D page_chain_tail(page, &i); - spin_lock(&drbd_pp_lock); - page_chain_add(&drbd_pp_pool, page, tmp); - drbd_pp_vacant +=3D i; - spin_unlock(&drbd_pp_lock); - } - i =3D atomic_sub_return(i, a); + page_chain_for_each_safe(page, tmp) { + set_page_private(page, 0); + if (page_count(page) =3D=3D 1) + mempool_free(page, &drbd_buffer_page_pool); + else + put_page(page); + i++; + } + i =3D atomic_sub_return(i, &device->pp_in_use); if (i < 0) - drbd_warn(device, "ASSERTION FAILED: %s: %d < 0\n", - is_net ? "pp_in_use_by_net" : "pp_in_use", i); - wake_up(&drbd_pp_wait); + drbd_warn(device, "ASSERTION FAILED: pp_in_use: %d < 0\n", i); } =20 /* @@ -380,6 +197,8 @@ drbd_alloc_peer_req(struct drbd_peer_device *peer_devic= e, u64 id, sector_t secto gfpflags_allow_blocking(gfp_mask)); if (!page) goto fail; + if (!mempool_is_saturated(&drbd_buffer_page_pool)) + peer_req->flags |=3D EE_RELEASE_TO_MEMPOOL; } =20 memset(peer_req, 0, sizeof(*peer_req)); @@ -403,13 +222,12 @@ drbd_alloc_peer_req(struct drbd_peer_device *peer_dev= ice, u64 id, sector_t secto return NULL; } =20 -void __drbd_free_peer_req(struct drbd_device *device, struct drbd_peer_req= uest *peer_req, - int is_net) +void drbd_free_peer_req(struct drbd_device *device, struct drbd_peer_reque= st *peer_req) { might_sleep(); if (peer_req->flags & EE_HAS_DIGEST) kfree(peer_req->digest); - drbd_free_pages(device, peer_req->pages, is_net); + drbd_free_pages(device, peer_req->pages); D_ASSERT(device, atomic_read(&peer_req->pending_bios) =3D=3D 0); D_ASSERT(device, drbd_interval_empty(&peer_req->i)); if (!expect(device, !(peer_req->flags & EE_CALL_AL_COMPLETE_IO))) { @@ -424,14 +242,13 @@ int drbd_free_peer_reqs(struct drbd_device *device, s= truct list_head *list) LIST_HEAD(work_list); struct drbd_peer_request *peer_req, *t; int count =3D 0; - int is_net =3D list =3D=3D &device->net_ee; =20 spin_lock_irq(&device->resource->req_lock); list_splice_init(list, &work_list); spin_unlock_irq(&device->resource->req_lock); =20 list_for_each_entry_safe(peer_req, t, &work_list, w.list) { - __drbd_free_peer_req(device, peer_req, is_net); + drbd_free_peer_req(device, peer_req); count++; } return count; @@ -443,18 +260,13 @@ int drbd_free_peer_reqs(struct drbd_device *device, s= truct list_head *list) static int drbd_finish_peer_reqs(struct drbd_device *device) { LIST_HEAD(work_list); - LIST_HEAD(reclaimed); struct drbd_peer_request *peer_req, *t; int err =3D 0; =20 spin_lock_irq(&device->resource->req_lock); - reclaim_finished_net_peer_reqs(device, &reclaimed); list_splice_init(&device->done_ee, &work_list); spin_unlock_irq(&device->resource->req_lock); =20 - list_for_each_entry_safe(peer_req, t, &reclaimed, w.list) - drbd_free_net_peer_req(device, peer_req); - /* possible callbacks here: * e_end_block, and e_end_resync_block, e_send_superseded. * all ignore the last argument. @@ -1975,7 +1787,7 @@ static int drbd_drain_block(struct drbd_peer_device *= peer_device, int data_size) data_size -=3D len; } kunmap(page); - drbd_free_pages(peer_device->device, page, 0); + drbd_free_pages(peer_device->device, page); return err; } =20 @@ -5220,16 +5032,6 @@ static int drbd_disconnected(struct drbd_peer_device= *peer_device) put_ldev(device); } =20 - /* tcp_close and release of sendpage pages can be deferred. I don't - * want to use SO_LINGER, because apparently it can be deferred for - * more than 20 seconds (longest time I checked). - * - * Actually we don't care for exactly when the network stack does its - * put_page(), but release our reference on these pages right here. - */ - i =3D drbd_free_peer_reqs(device, &device->net_ee); - if (i) - drbd_info(device, "net_ee not empty, killed %u entries\n", i); i =3D atomic_read(&device->pp_in_use_by_net); if (i) drbd_info(device, "pp_in_use_by_net =3D %d, expected 0\n", i); @@ -5976,8 +5778,6 @@ int drbd_ack_receiver(struct drbd_thread *thi) while (get_t_state(thi) =3D=3D RUNNING) { drbd_thread_current_set_cpu(thi); =20 - conn_reclaim_net_peer_reqs(connection); - if (test_and_clear_bit(SEND_PING, &connection->flags)) { if (drbd_send_ping(connection)) { drbd_err(connection, "drbd_send_ping has failed\n"); diff --git a/drivers/block/drbd/drbd_worker.c b/drivers/block/drbd/drbd_wor= ker.c index 4352a50fbb3f..d74064bb64be 100644 --- a/drivers/block/drbd/drbd_worker.c +++ b/drivers/block/drbd/drbd_worker.c @@ -1029,22 +1029,6 @@ int drbd_resync_finished(struct drbd_peer_device *pe= er_device) return 1; } =20 -/* helper */ -static void move_to_net_ee_or_free(struct drbd_device *device, struct drbd= _peer_request *peer_req) -{ - if (drbd_peer_req_has_active_page(peer_req)) { - /* This might happen if sendpage() has not finished */ - int i =3D PFN_UP(peer_req->i.size); - atomic_add(i, &device->pp_in_use_by_net); - atomic_sub(i, &device->pp_in_use); - spin_lock_irq(&device->resource->req_lock); - list_add_tail(&peer_req->w.list, &device->net_ee); - spin_unlock_irq(&device->resource->req_lock); - wake_up(&drbd_pp_wait); - } else - drbd_free_peer_req(device, peer_req); -} - /** * w_e_end_data_req() - Worker callback, to send a P_DATA_REPLY packet in = response to a P_DATA_REQUEST * @w: work object. @@ -1058,9 +1042,8 @@ int w_e_end_data_req(struct drbd_work *w, int cancel) int err; =20 if (unlikely(cancel)) { - drbd_free_peer_req(device, peer_req); - dec_unacked(device); - return 0; + err =3D 0; + goto out; } =20 if (likely((peer_req->flags & EE_WAS_ERROR) =3D=3D 0)) { @@ -1073,12 +1056,12 @@ int w_e_end_data_req(struct drbd_work *w, int cance= l) err =3D drbd_send_ack(peer_device, P_NEG_DREPLY, peer_req); } =20 - dec_unacked(device); - - move_to_net_ee_or_free(device, peer_req); - if (unlikely(err)) drbd_err(device, "drbd_send_block() failed\n"); +out: + dec_unacked(device); + drbd_free_peer_req(device, peer_req); + return err; } =20 @@ -1119,9 +1102,8 @@ int w_e_end_rsdata_req(struct drbd_work *w, int cance= l) int err; =20 if (unlikely(cancel)) { - drbd_free_peer_req(device, peer_req); - dec_unacked(device); - return 0; + err =3D 0; + goto out; } =20 if (get_ldev_if_state(device, D_FAILED)) { @@ -1154,13 +1136,12 @@ int w_e_end_rsdata_req(struct drbd_work *w, int can= cel) /* update resync data with failure */ drbd_rs_failed_io(peer_device, peer_req->i.sector, peer_req->i.size); } - - dec_unacked(device); - - move_to_net_ee_or_free(device, peer_req); - if (unlikely(err)) drbd_err(device, "drbd_send_block() failed\n"); +out: + dec_unacked(device); + drbd_free_peer_req(device, peer_req); + return err; } =20 @@ -1175,9 +1156,8 @@ int w_e_end_csum_rs_req(struct drbd_work *w, int canc= el) int err, eq =3D 0; =20 if (unlikely(cancel)) { - drbd_free_peer_req(device, peer_req); - dec_unacked(device); - return 0; + err =3D 0; + goto out; } =20 if (get_ldev(device)) { @@ -1219,12 +1199,12 @@ int w_e_end_csum_rs_req(struct drbd_work *w, int ca= ncel) if (drbd_ratelimit()) drbd_err(device, "Sending NegDReply. I guess it gets messy.\n"); } - - dec_unacked(device); - move_to_net_ee_or_free(device, peer_req); - if (unlikely(err)) drbd_err(device, "drbd_send_block/ack() failed\n"); +out: + dec_unacked(device); + drbd_free_peer_req(device, peer_req); + return err; } =20 base-commit: b02f5eedbcabe6e1982fdd7ff3f0ac5d1fddc68f --=20 2.49.0