From nobody Sat Nov 30 10:48:39 2024 Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C73E61A4B7B for ; Tue, 10 Sep 2024 17:15:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725988518; cv=none; b=pA9w+dtH82tuPnd+jc6CXiCVfHz5ZQIItWA1bMOr8KM9TMDITGPYMKON8v23nygwFdEdhaYb5PCkz7nUAZpQsMa+YWAlWslruQn7KXjZoWr8DUDkWg+i3IihlFHloZzbszx4yDLcgZQ0sfhzZSQWL5yU+P/uYJJkIeI98X31VNw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725988518; c=relaxed/simple; bh=G8h495oW8j1V8acEQ1khiLcjr2TSX1s6Ns4IANanpGQ=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=E0yjoVKLyU66nLPhqWTmTlw/8lVnCbDU/TbjmtMMIXQg8ZStbqF5qoCwLFlE7+tAlEIyVOpwtjwwCu8uqw++h0g0y0GjO3efernP2Qgetjp8xYDsPGois1xuAEsls1gQrLfs/43f7ethO/4muZe0SDswYoLJf8F1O0mzyJ+zUvc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--almasrymina.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=q7Tl8RRB; arc=none smtp.client-ip=209.85.219.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--almasrymina.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="q7Tl8RRB" Received: by mail-yb1-f201.google.com with SMTP id 3f1490d57ef6-e1a7fd2eb36so12235656276.0 for ; Tue, 10 Sep 2024 10:15:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1725988511; x=1726593311; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=ptuyKQhdfguBVGXlXw7u1dthGwQK3szi/zqZsYyfadY=; b=q7Tl8RRB9J9eCWUX6KZ7pUl63375u78mogyHAq4iYmACTqbUBh3KsLCGYtG/k6rqmW JWCF16ccnomSekwU2Hw/39lQQp2UeDiXOv511y13bvz0T3E/iYF2wZ/LzUGJznLOOzxf UFnF3SJVByMtks+jE19pobW8yaHaoHcV430lrEjnK16l7+h/n3jPEI3qeSfGWSfcXoFJ w0JfnNgDPLHXUy9YyZT5PCH6X9sNvMY9Ewu22eZhO3qoq1lRrF0oila65m1RupzIgqJV d3DRy4foSMg7pYKWNMMvTf3qQ1Y6TSDk9ayFaAqXTb0Esu5GIWCpzjgFvOe9LjHHllXG a8ew== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725988511; x=1726593311; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=ptuyKQhdfguBVGXlXw7u1dthGwQK3szi/zqZsYyfadY=; b=coZlN+iJkbkYTb5i7eYsx8k1PxyD5msuASKu9rZHGclp3cAoY/6XXNQdKXxn7WZI+h ce2SjIrKnV96MHqoXWZp3qCj2v3Pln96RJhUtOfY5DYHxnYyPapc+4ckWRobEtlyvq3M fCaXAtf+e4OBz2+A9Kd697xkuk+VgoqhjQzemNmDg8P3ZNaaMh7dcVZdxHAc+9lalb5I t/IN2Y18q1eiSBzxNrUcTL6qiTW2KoOLSv6Ld8UxQ2HQF/kaKUWk0C/nzZZxgT2qefik kSZ+f/S8dDmmVxzy5qQLy5l8iouyQhhlrCDI1ZEj6dnlMXpbWDcxBH4pN5ULy0mX9oJJ tKtA== X-Forwarded-Encrypted: i=1; AJvYcCUh6TfP89ju3SBgxk6+2b6GKcM4j8dCZqsVjrfpKxzN4wUesxltakZQUbHV58rLD/8Jl0d3+idW4L7AerM=@vger.kernel.org X-Gm-Message-State: AOJu0Yz4qArhtnj/+u5ZLKGGWwHlc56MfzeIUrFOqTTrHaphD7D55PZM oXnfNTejjy348FTDdh7VWDOJ2+0MQBJAc5AbkO0g3sY/wsA8hxsYYPr2Rool3SYm2cG3dzZ5eRp tex8zvObS8RFcVwIhCLfNtQ== X-Google-Smtp-Source: AGHT+IFHH2iFW6O3/pITphEyk+4kJsRpsSz6ouIBeK4lRdOMwtKrcRbUn5Don+i/1yFfJz9AxbWs5lUcOC/fBNRRrg== X-Received: from almasrymina.c.googlers.com ([fda3:e722:ac3:cc00:20:ed76:c0a8:4bc5]) (user=almasrymina job=sendgmr) by 2002:a25:4cc4:0:b0:e0e:c9bc:3206 with SMTP id 3f1490d57ef6-e1d34890ee8mr20813276.5.1725988510741; Tue, 10 Sep 2024 10:15:10 -0700 (PDT) Date: Tue, 10 Sep 2024 17:14:49 +0000 In-Reply-To: <20240910171458.219195-1-almasrymina@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240910171458.219195-1-almasrymina@google.com> X-Mailer: git-send-email 2.46.0.598.g6f2099f65c-goog Message-ID: <20240910171458.219195-6-almasrymina@google.com> Subject: [PATCH net-next v26 05/13] page_pool: devmem support From: Mina Almasry To: netdev@vger.kernel.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-alpha@vger.kernel.org, linux-mips@vger.kernel.org, linux-parisc@vger.kernel.org, sparclinux@vger.kernel.org, linux-trace-kernel@vger.kernel.org, linux-arch@vger.kernel.org, bpf@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-media@vger.kernel.org, dri-devel@lists.freedesktop.org Cc: Mina Almasry , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Donald Hunter , Jonathan Corbet , Richard Henderson , Ivan Kokshaysky , Matt Turner , Thomas Bogendoerfer , "James E.J. Bottomley" , Helge Deller , Andreas Larsson , Jesper Dangaard Brouer , Ilias Apalodimas , Steven Rostedt , Masami Hiramatsu , Mathieu Desnoyers , Arnd Bergmann , Steffen Klassert , Herbert Xu , David Ahern , Willem de Bruijn , "=?UTF-8?q?Bj=C3=B6rn=20T=C3=B6pel?=" , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , Shuah Khan , Alexei Starovoitov , Daniel Borkmann , John Fastabend , Sumit Semwal , "=?UTF-8?q?Christian=20K=C3=B6nig?=" , Pavel Begunkov , David Wei , Jason Gunthorpe , Yunsheng Lin , Shailend Chand , Harshitha Ramamurthy , Shakeel Butt , Jeroen de Borst , Praveen Kaligineedi , Bagas Sanjaya , Christoph Hellwig , Nikolay Aleksandrov , Taehee Yoo , linux-mm@kvack.org, Matthew Wilcox Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Convert netmem to be a union of struct page and struct netmem. Overload the LSB of struct netmem* to indicate that it's a net_iov, otherwise it's a page. Currently these entries in struct page are rented by the page_pool and used exclusively by the net stack: struct { unsigned long pp_magic; struct page_pool *pp; unsigned long _pp_mapping_pad; unsigned long dma_addr; atomic_long_t pp_ref_count; }; Mirror these (and only these) entries into struct net_iov and implement netmem helpers that can access these common fields regardless of whether the underlying type is page or net_iov. Implement checks for net_iov in netmem helpers which delegate to mm APIs, to ensure net_iov are never passed to the mm stack. Signed-off-by: Mina Almasry Reviewed-by: Pavel Begunkov Acked-by: Jakub Kicinski --- v23: - Fix comment on netmem_is_perf_nid (Jakub) v19: - Move page_pool_set_dma_addr(_netmem) to page_pool_priv.h - Don't reset niov dma_addr on allocation/free. Instead, it's set once when the binding happens and it never changes (Jakub) v17: - Rename netmem_to_pfn to netmem_pfn_trace (Jakub) - Move some low level netmem helpers to netmem_priv.h (Jakub). v13: - Move NET_IOV dependent changes to this patch. - Fixed comment (Pavel) - Applied Reviewed-by from Pavel. v9: https://lore.kernel.org/netdev/20240403002053.2376017-8-almasrymina@goo= gle.com/ - Remove CONFIG checks in netmem_is_net_iov() (Pavel/David/Jens) v7: - Remove static_branch_unlikely from netmem_to_net_iov(). We're getting better results from the fast path in bench_page_pool_simple tests without the static_branch_unlikely, and the addition of static_branch_unlikely doesn't improve performance of devmem TCP. Additionally only check netmem_to_net_iov() if CONFIG_DMA_SHARED_BUFFER is enabled, otherwise dmabuf net_iovs cannot exist anyway. net-next base: 8 cycle fast path. with static_branch_unlikely: 10 cycle fast path. without static_branch_unlikely: 9 cycle fast path. CONFIG_DMA_SHARED_BUFFER disabled: 8 cycle fast path as baseline. Performance of devmem TCP is at 95% line rate is regardless of static_branch_unlikely or not. v6: - Rebased on top of the merged netmem_ref type. - Rebased on top of the merged skb_pp_frag_ref() changes. v5: - Use netmem instead of page* with LSB set. - Use pp_ref_count for refcounting net_iov. - Removed many of the custom checks for netmem. v1: - Disable fragmentation support for iov properly. - fix napi_pp_put_page() path (Yunsheng). - Use pp_frag_count for devmem refcounting. Cc: linux-mm@kvack.org Cc: Matthew Wilcox --- include/net/netmem.h | 124 +++++++++++++++++++++++++++++-- include/net/page_pool/helpers.h | 39 ++-------- include/trace/events/page_pool.h | 12 +-- net/core/devmem.c | 7 ++ net/core/netmem_priv.h | 31 ++++++++ net/core/page_pool.c | 25 ++++--- net/core/page_pool_priv.h | 26 +++++++ net/core/skbuff.c | 23 +++--- 8 files changed, 218 insertions(+), 69 deletions(-) create mode 100644 net/core/netmem_priv.h diff --git a/include/net/netmem.h b/include/net/netmem.h index c23e224dd6a0..8a6e20be4b9d 100644 --- a/include/net/netmem.h +++ b/include/net/netmem.h @@ -8,12 +8,52 @@ #ifndef _NET_NETMEM_H #define _NET_NETMEM_H =20 +#include +#include + /* net_iov */ =20 +DECLARE_STATIC_KEY_FALSE(page_pool_mem_providers); + +/* We overload the LSB of the struct page pointer to indicate whether it's + * a page or net_iov. + */ +#define NET_IOV 0x01UL + struct net_iov { + unsigned long __unused_padding; + unsigned long pp_magic; + struct page_pool *pp; struct dmabuf_genpool_chunk_owner *owner; + unsigned long dma_addr; + atomic_long_t pp_ref_count; }; =20 +/* These fields in struct page are used by the page_pool and net stack: + * + * struct { + * unsigned long pp_magic; + * struct page_pool *pp; + * unsigned long _pp_mapping_pad; + * unsigned long dma_addr; + * atomic_long_t pp_ref_count; + * }; + * + * We mirror the page_pool fields here so the page_pool can access these f= ields + * without worrying whether the underlying fields belong to a page or net_= iov. + * + * The non-net stack fields of struct page are private to the mm stack and= must + * never be mirrored to net_iov. + */ +#define NET_IOV_ASSERT_OFFSET(pg, iov) \ + static_assert(offsetof(struct page, pg) =3D=3D \ + offsetof(struct net_iov, iov)) +NET_IOV_ASSERT_OFFSET(pp_magic, pp_magic); +NET_IOV_ASSERT_OFFSET(pp, pp); +NET_IOV_ASSERT_OFFSET(dma_addr, dma_addr); +NET_IOV_ASSERT_OFFSET(pp_ref_count, pp_ref_count); +#undef NET_IOV_ASSERT_OFFSET + /* netmem */ =20 /** @@ -27,20 +67,37 @@ struct net_iov { */ typedef unsigned long __bitwise netmem_ref; =20 +static inline bool netmem_is_net_iov(const netmem_ref netmem) +{ + return (__force unsigned long)netmem & NET_IOV; +} + /* This conversion fails (returns NULL) if the netmem_ref is not struct pa= ge * backed. - * - * Currently struct page is the only possible netmem, and this helper never - * fails. */ static inline struct page *netmem_to_page(netmem_ref netmem) { + if (WARN_ON_ONCE(netmem_is_net_iov(netmem))) + return NULL; + return (__force struct page *)netmem; } =20 -/* Converting from page to netmem is always safe, because a page can alway= s be - * a netmem. - */ +static inline struct net_iov *netmem_to_net_iov(netmem_ref netmem) +{ + if (netmem_is_net_iov(netmem)) + return (struct net_iov *)((__force unsigned long)netmem & + ~NET_IOV); + + DEBUG_NET_WARN_ON_ONCE(true); + return NULL; +} + +static inline netmem_ref net_iov_to_netmem(struct net_iov *niov) +{ + return (__force netmem_ref)((unsigned long)niov | NET_IOV); +} + static inline netmem_ref page_to_netmem(struct page *page) { return (__force netmem_ref)page; @@ -48,17 +105,70 @@ static inline netmem_ref page_to_netmem(struct page *p= age) =20 static inline int netmem_ref_count(netmem_ref netmem) { + /* The non-pp refcount of net_iov is always 1. On net_iov, we only + * support pp refcounting which uses the pp_ref_count field. + */ + if (netmem_is_net_iov(netmem)) + return 1; + return page_ref_count(netmem_to_page(netmem)); } =20 -static inline unsigned long netmem_to_pfn(netmem_ref netmem) +static inline unsigned long netmem_pfn_trace(netmem_ref netmem) { + if (netmem_is_net_iov(netmem)) + return 0; + return page_to_pfn(netmem_to_page(netmem)); } =20 +static inline struct net_iov *__netmem_clear_lsb(netmem_ref netmem) +{ + return (struct net_iov *)((__force unsigned long)netmem & ~NET_IOV); +} + +static inline struct page_pool *netmem_get_pp(netmem_ref netmem) +{ + return __netmem_clear_lsb(netmem)->pp; +} + +static inline atomic_long_t *netmem_get_pp_ref_count_ref(netmem_ref netmem) +{ + return &__netmem_clear_lsb(netmem)->pp_ref_count; +} + +static inline bool netmem_is_pref_nid(netmem_ref netmem, int pref_nid) +{ + /* NUMA node preference only makes sense if we're allocating + * system memory. Memory providers (which give us net_iovs) + * choose for us. + */ + if (netmem_is_net_iov(netmem)) + return true; + + return page_to_nid(netmem_to_page(netmem)) =3D=3D pref_nid; +} + static inline netmem_ref netmem_compound_head(netmem_ref netmem) { + /* niov are never compounded */ + if (netmem_is_net_iov(netmem)) + return netmem; + return page_to_netmem(compound_head(netmem_to_page(netmem))); } =20 +static inline void *netmem_address(netmem_ref netmem) +{ + if (netmem_is_net_iov(netmem)) + return NULL; + + return page_address(netmem_to_page(netmem)); +} + +static inline unsigned long netmem_get_dma_addr(netmem_ref netmem) +{ + return __netmem_clear_lsb(netmem)->dma_addr; +} + #endif /* _NET_NETMEM_H */ diff --git a/include/net/page_pool/helpers.h b/include/net/page_pool/helper= s.h index 2b43a893c619..793e6fd78bc5 100644 --- a/include/net/page_pool/helpers.h +++ b/include/net/page_pool/helpers.h @@ -216,7 +216,7 @@ page_pool_get_dma_dir(const struct page_pool *pool) =20 static inline void page_pool_fragment_netmem(netmem_ref netmem, long nr) { - atomic_long_set(&netmem_to_page(netmem)->pp_ref_count, nr); + atomic_long_set(netmem_get_pp_ref_count_ref(netmem), nr); } =20 /** @@ -244,7 +244,7 @@ static inline void page_pool_fragment_page(struct page = *page, long nr) =20 static inline long page_pool_unref_netmem(netmem_ref netmem, long nr) { - struct page *page =3D netmem_to_page(netmem); + atomic_long_t *pp_ref_count =3D netmem_get_pp_ref_count_ref(netmem); long ret; =20 /* If nr =3D=3D pp_ref_count then we have cleared all remaining @@ -261,19 +261,19 @@ static inline long page_pool_unref_netmem(netmem_ref = netmem, long nr) * initially, and only overwrite it when the page is partitioned into * more than one piece. */ - if (atomic_long_read(&page->pp_ref_count) =3D=3D nr) { + if (atomic_long_read(pp_ref_count) =3D=3D nr) { /* As we have ensured nr is always one for constant case using * the BUILD_BUG_ON(), only need to handle the non-constant case * here for pp_ref_count draining, which is a rare case. */ BUILD_BUG_ON(__builtin_constant_p(nr) && nr !=3D 1); if (!__builtin_constant_p(nr)) - atomic_long_set(&page->pp_ref_count, 1); + atomic_long_set(pp_ref_count, 1); =20 return 0; } =20 - ret =3D atomic_long_sub_return(nr, &page->pp_ref_count); + ret =3D atomic_long_sub_return(nr, pp_ref_count); WARN_ON(ret < 0); =20 /* We are the last user here too, reset pp_ref_count back to 1 to @@ -282,7 +282,7 @@ static inline long page_pool_unref_netmem(netmem_ref ne= tmem, long nr) * page_pool_unref_page() currently. */ if (unlikely(!ret)) - atomic_long_set(&page->pp_ref_count, 1); + atomic_long_set(pp_ref_count, 1); =20 return ret; } @@ -401,9 +401,7 @@ static inline void page_pool_free_va(struct page_pool *= pool, void *va, =20 static inline dma_addr_t page_pool_get_dma_addr_netmem(netmem_ref netmem) { - struct page *page =3D netmem_to_page(netmem); - - dma_addr_t ret =3D page->dma_addr; + dma_addr_t ret =3D netmem_get_dma_addr(netmem); =20 if (PAGE_POOL_32BIT_ARCH_WITH_64BIT_DMA) ret <<=3D PAGE_SHIFT; @@ -423,24 +421,6 @@ static inline dma_addr_t page_pool_get_dma_addr(const = struct page *page) return page_pool_get_dma_addr_netmem(page_to_netmem((struct page *)page)); } =20 -static inline bool page_pool_set_dma_addr_netmem(netmem_ref netmem, - dma_addr_t addr) -{ - struct page *page =3D netmem_to_page(netmem); - - if (PAGE_POOL_32BIT_ARCH_WITH_64BIT_DMA) { - page->dma_addr =3D addr >> PAGE_SHIFT; - - /* We assume page alignment to shave off bottom bits, - * if this "compression" doesn't work we need to drop. - */ - return addr !=3D (dma_addr_t)page->dma_addr << PAGE_SHIFT; - } - - page->dma_addr =3D addr; - return false; -} - /** * page_pool_dma_sync_for_cpu - sync Rx page for CPU after it's written by= HW * @pool: &page_pool the @page belongs to @@ -463,11 +443,6 @@ static inline void page_pool_dma_sync_for_cpu(const st= ruct page_pool *pool, page_pool_get_dma_dir(pool)); } =20 -static inline bool page_pool_set_dma_addr(struct page *page, dma_addr_t ad= dr) -{ - return page_pool_set_dma_addr_netmem(page_to_netmem(page), addr); -} - static inline bool page_pool_put(struct page_pool *pool) { return refcount_dec_and_test(&pool->user_cnt); diff --git a/include/trace/events/page_pool.h b/include/trace/events/page_p= ool.h index 543e54e432a1..31825ed30032 100644 --- a/include/trace/events/page_pool.h +++ b/include/trace/events/page_pool.h @@ -57,12 +57,12 @@ TRACE_EVENT(page_pool_state_release, __entry->pool =3D pool; __entry->netmem =3D (__force unsigned long)netmem; __entry->release =3D release; - __entry->pfn =3D netmem_to_pfn(netmem); + __entry->pfn =3D netmem_pfn_trace(netmem); ), =20 - TP_printk("page_pool=3D%p netmem=3D%p pfn=3D0x%lx release=3D%u", + TP_printk("page_pool=3D%p netmem=3D%p is_net_iov=3D%lu pfn=3D0x%lx releas= e=3D%u", __entry->pool, (void *)__entry->netmem, - __entry->pfn, __entry->release) + __entry->netmem & NET_IOV, __entry->pfn, __entry->release) ); =20 TRACE_EVENT(page_pool_state_hold, @@ -83,12 +83,12 @@ TRACE_EVENT(page_pool_state_hold, __entry->pool =3D pool; __entry->netmem =3D (__force unsigned long)netmem; __entry->hold =3D hold; - __entry->pfn =3D netmem_to_pfn(netmem); + __entry->pfn =3D netmem_pfn_trace(netmem); ), =20 - TP_printk("page_pool=3D%p netmem=3D%p pfn=3D0x%lx hold=3D%u", + TP_printk("page_pool=3D%p netmem=3D%p is_net_iov=3D%lu, pfn=3D0x%lx hold= =3D%u", __entry->pool, (void *)__entry->netmem, - __entry->pfn, __entry->hold) + __entry->netmem & NET_IOV, __entry->pfn, __entry->hold) ); =20 TRACE_EVENT(page_pool_update_nid, diff --git a/net/core/devmem.c b/net/core/devmem.c index 9beb03763dc9..7efeb602cf45 100644 --- a/net/core/devmem.c +++ b/net/core/devmem.c @@ -18,6 +18,7 @@ #include =20 #include "devmem.h" +#include "page_pool_priv.h" =20 /* Device memory support */ =20 @@ -82,6 +83,10 @@ net_devmem_alloc_dmabuf(struct net_devmem_dmabuf_binding= *binding) index =3D offset / PAGE_SIZE; niov =3D &owner->niovs[index]; =20 + niov->pp_magic =3D 0; + niov->pp =3D NULL; + atomic_long_set(&niov->pp_ref_count, 0); + return niov; } =20 @@ -269,6 +274,8 @@ net_devmem_bind_dmabuf(struct net_device *dev, unsigned= int dmabuf_fd, for (i =3D 0; i < owner->num_niovs; i++) { niov =3D &owner->niovs[i]; niov->owner =3D owner; + page_pool_set_dma_addr_netmem(net_iov_to_netmem(niov), + net_devmem_get_dma_addr(niov)); } =20 virtual +=3D len; diff --git a/net/core/netmem_priv.h b/net/core/netmem_priv.h new file mode 100644 index 000000000000..7eadb8393e00 --- /dev/null +++ b/net/core/netmem_priv.h @@ -0,0 +1,31 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#ifndef __NETMEM_PRIV_H +#define __NETMEM_PRIV_H + +static inline unsigned long netmem_get_pp_magic(netmem_ref netmem) +{ + return __netmem_clear_lsb(netmem)->pp_magic; +} + +static inline void netmem_or_pp_magic(netmem_ref netmem, unsigned long pp_= magic) +{ + __netmem_clear_lsb(netmem)->pp_magic |=3D pp_magic; +} + +static inline void netmem_clear_pp_magic(netmem_ref netmem) +{ + __netmem_clear_lsb(netmem)->pp_magic =3D 0; +} + +static inline void netmem_set_pp(netmem_ref netmem, struct page_pool *pool) +{ + __netmem_clear_lsb(netmem)->pp =3D pool; +} + +static inline void netmem_set_dma_addr(netmem_ref netmem, + unsigned long dma_addr) +{ + __netmem_clear_lsb(netmem)->dma_addr =3D dma_addr; +} +#endif diff --git a/net/core/page_pool.c b/net/core/page_pool.c index 2abe6e919224..52659db2d765 100644 --- a/net/core/page_pool.c +++ b/net/core/page_pool.c @@ -24,8 +24,11 @@ =20 #include =20 +#include "netmem_priv.h" #include "page_pool_priv.h" =20 +DEFINE_STATIC_KEY_FALSE(page_pool_mem_providers); + #define DEFER_TIME (msecs_to_jiffies(1000)) #define DEFER_WARN_INTERVAL (60 * HZ) =20 @@ -358,7 +361,7 @@ static noinline netmem_ref page_pool_refill_alloc_cache= (struct page_pool *pool) if (unlikely(!netmem)) break; =20 - if (likely(page_to_nid(netmem_to_page(netmem)) =3D=3D pref_nid)) { + if (likely(netmem_is_pref_nid(netmem, pref_nid))) { pool->alloc.cache[pool->alloc.count++] =3D netmem; } else { /* NUMA mismatch; @@ -454,10 +457,8 @@ static bool page_pool_dma_map(struct page_pool *pool, = netmem_ref netmem) =20 static void page_pool_set_pp_info(struct page_pool *pool, netmem_ref netme= m) { - struct page *page =3D netmem_to_page(netmem); - - page->pp =3D pool; - page->pp_magic |=3D PP_SIGNATURE; + netmem_set_pp(netmem, pool); + netmem_or_pp_magic(netmem, PP_SIGNATURE); =20 /* Ensuring all pages have been split into one fragment initially: * page_pool_set_pp_info() is only called once for every page when it @@ -472,10 +473,8 @@ static void page_pool_set_pp_info(struct page_pool *po= ol, netmem_ref netmem) =20 static void page_pool_clear_pp_info(netmem_ref netmem) { - struct page *page =3D netmem_to_page(netmem); - - page->pp_magic =3D 0; - page->pp =3D NULL; + netmem_clear_pp_magic(netmem); + netmem_set_pp(netmem, NULL); } =20 static struct page *__page_pool_alloc_page_order(struct page_pool *pool, @@ -692,8 +691,9 @@ static bool page_pool_recycle_in_cache(netmem_ref netme= m, =20 static bool __page_pool_page_can_be_recycled(netmem_ref netmem) { - return page_ref_count(netmem_to_page(netmem)) =3D=3D 1 && - !page_is_pfmemalloc(netmem_to_page(netmem)); + return netmem_is_net_iov(netmem) || + (page_ref_count(netmem_to_page(netmem)) =3D=3D 1 && + !page_is_pfmemalloc(netmem_to_page(netmem))); } =20 /* If the page refcnt =3D=3D 1, this will try to recycle the page. @@ -728,6 +728,7 @@ __page_pool_put_page(struct page_pool *pool, netmem_ref= netmem, /* Page found as candidate for recycling */ return netmem; } + /* Fallback/non-XDP mode: API user have elevated refcnt. * * Many drivers split up the page into fragments, and some @@ -949,7 +950,7 @@ static void page_pool_empty_ring(struct page_pool *pool) /* Empty recycle ring */ while ((netmem =3D (__force netmem_ref)ptr_ring_consume_bh(&pool->ring)))= { /* Verify the refcnt invariant of cached pages */ - if (!(page_ref_count(netmem_to_page(netmem)) =3D=3D 1)) + if (!(netmem_ref_count(netmem) =3D=3D 1)) pr_crit("%s() page_pool refcnt %d violation\n", __func__, netmem_ref_count(netmem)); =20 diff --git a/net/core/page_pool_priv.h b/net/core/page_pool_priv.h index 90665d40f1eb..d602c1e728c2 100644 --- a/net/core/page_pool_priv.h +++ b/net/core/page_pool_priv.h @@ -3,10 +3,36 @@ #ifndef __PAGE_POOL_PRIV_H #define __PAGE_POOL_PRIV_H =20 +#include + +#include "netmem_priv.h" + s32 page_pool_inflight(const struct page_pool *pool, bool strict); =20 int page_pool_list(struct page_pool *pool); void page_pool_detached(struct page_pool *pool); void page_pool_unlist(struct page_pool *pool); =20 +static inline bool +page_pool_set_dma_addr_netmem(netmem_ref netmem, dma_addr_t addr) +{ + if (PAGE_POOL_32BIT_ARCH_WITH_64BIT_DMA) { + netmem_set_dma_addr(netmem, addr >> PAGE_SHIFT); + + /* We assume page alignment to shave off bottom bits, + * if this "compression" doesn't work we need to drop. + */ + return addr !=3D (dma_addr_t)netmem_get_dma_addr(netmem) + << PAGE_SHIFT; + } + + netmem_set_dma_addr(netmem, addr); + return false; +} + +static inline bool page_pool_set_dma_addr(struct page *page, dma_addr_t ad= dr) +{ + return page_pool_set_dma_addr_netmem(page_to_netmem(page), addr); +} + #endif diff --git a/net/core/skbuff.c b/net/core/skbuff.c index a52638363ea5..d9634ab342cc 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -88,6 +88,7 @@ #include =20 #include "dev.h" +#include "netmem_priv.h" #include "sock_destructor.h" =20 #ifdef CONFIG_SKB_EXTENSIONS @@ -920,9 +921,9 @@ static void skb_clone_fraglist(struct sk_buff *skb) skb_get(list); } =20 -static bool is_pp_page(struct page *page) +static bool is_pp_netmem(netmem_ref netmem) { - return (page->pp_magic & ~0x3UL) =3D=3D PP_SIGNATURE; + return (netmem_get_pp_magic(netmem) & ~0x3UL) =3D=3D PP_SIGNATURE; } =20 int skb_pp_cow_data(struct page_pool *pool, struct sk_buff **pskb, @@ -1020,9 +1021,7 @@ EXPORT_SYMBOL(skb_cow_data_for_xdp); #if IS_ENABLED(CONFIG_PAGE_POOL) bool napi_pp_put_page(netmem_ref netmem) { - struct page *page =3D netmem_to_page(netmem); - - page =3D compound_head(page); + netmem =3D netmem_compound_head(netmem); =20 /* page->pp_magic is OR'ed with PP_SIGNATURE after the allocation * in order to preserve any existing bits, such as bit 0 for the @@ -1031,10 +1030,10 @@ bool napi_pp_put_page(netmem_ref netmem) * and page_is_pfmemalloc() is checked in __page_pool_put_page() * to avoid recycling the pfmemalloc page. */ - if (unlikely(!is_pp_page(page))) + if (unlikely(!is_pp_netmem(netmem))) return false; =20 - page_pool_put_full_netmem(page->pp, page_to_netmem(page), false); + page_pool_put_full_netmem(netmem_get_pp(netmem), netmem, false); =20 return true; } @@ -1061,7 +1060,7 @@ static bool skb_pp_recycle(struct sk_buff *skb, void = *data) static int skb_pp_frag_ref(struct sk_buff *skb) { struct skb_shared_info *shinfo; - struct page *head_page; + netmem_ref head_netmem; int i; =20 if (!skb->pp_recycle) @@ -1070,11 +1069,11 @@ static int skb_pp_frag_ref(struct sk_buff *skb) shinfo =3D skb_shinfo(skb); =20 for (i =3D 0; i < shinfo->nr_frags; i++) { - head_page =3D compound_head(skb_frag_page(&shinfo->frags[i])); - if (likely(is_pp_page(head_page))) - page_pool_ref_page(head_page); + head_netmem =3D netmem_compound_head(shinfo->frags[i].netmem); + if (likely(is_pp_netmem(head_netmem))) + page_pool_ref_netmem(head_netmem); else - page_ref_inc(head_page); + page_ref_inc(netmem_to_page(head_netmem)); } return 0; } --=20 2.46.0.598.g6f2099f65c-goog