From nobody Tue Dec 16 00:22:12 2025 Received: from mail-pl1-f225.google.com (mail-pl1-f225.google.com [209.85.214.225]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6171E1E0E01 for ; Sat, 26 Apr 2025 02:07:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.225 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745633279; cv=none; b=iV4SFFL2GbIfQfvp3RfKnknXKVzqpqeD0SKbniPJ2xn/QLUUUz0TbENTCLyO918L1HQZlzSauW4vdWMZ7fR37Zyw7eBm5soKqewhAaakOXCC2aznM5cTmijkZA3N6xQWs6BxnyJza5BGfDVy8SprhlteoZNimMZ2O4B0j8twq6Y= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745633279; c=relaxed/simple; bh=RVl1/x48L5QPFusv576G2o1dt5vcOYipuP+xPdQuLaU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=QzPJXOLKy41WqWtdZF/GCCi93NqsFCBx/phmJfLCFpsyZJxascXMRPH0GDUtJxISd2fp8N38CjE71qDa1CwgCcYoc1fWQrGEtbnbfmk1gjh+ea/n5vW6Sl6HWR5nDiWk6u5ImAFxOQhm3LV6MJoUppO1lKWFKoBd1PJMa7XLs6o= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=purestorage.com; spf=fail smtp.mailfrom=purestorage.com; dkim=pass (2048-bit key) header.d=purestorage.com header.i=@purestorage.com header.b=GDhj90U9; arc=none smtp.client-ip=209.85.214.225 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=purestorage.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=purestorage.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=purestorage.com header.i=@purestorage.com header.b="GDhj90U9" Received: by mail-pl1-f225.google.com with SMTP id d9443c01a7336-227e29b6c55so4993025ad.1 for ; Fri, 25 Apr 2025 19:07:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=purestorage.com; s=google2022; t=1745633276; x=1746238076; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Z70/ocLe9ndcQrFKMiE0tdyuA0H8cSwf3KgSSnH+OZY=; b=GDhj90U9fLXHDc0bjivdvI2PczPJpN3mGJ3POx7W31Kr0WKqx+lnYJSAVons3fB53G wWgrdQ03t1AKcv9bKXfKNxXqmOrpwnxzDoW3W8MwZTS7fayVgGUeTMOVUe5IFV43cL1H eBBMJWBYu+v2Twjikczf3ro8Or2t6M9m06Tbsp7hh0tKudAV8sae+GPF8eCgK5iWbw2g s4KEP2JwXsXnIyfDfTIfrRwj2DRIN5ExnLO8PhEOlHqKWP3L0hdZCrTf9aJ0rCVPX5kG 6GAI8o8jEZzPVDXimTtpPkV+eMU+s6fJ01t34D8eM24dT6pfSx6BMZOif3yJ5IncnUXb QOeA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1745633276; x=1746238076; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Z70/ocLe9ndcQrFKMiE0tdyuA0H8cSwf3KgSSnH+OZY=; b=W9VKH+zqJmI6HoHgN1gDGUW+gpaIdg9wcBMIQjsuo5auWWqr/Bie02ytYzIuX8f9a5 R31fQc9s4s6VLgjuMq8cI5Zk2HcuLk8x2u8tiZl4grbIQIW9O3MdaCVVqEeJPBqTJboA SVpDQIshnv+VqWeVgFQRH080hNwXGRdUcEzRh43dazWGKRElP2HUNuCn9erNBtxpTE62 b4zxmG4aiS/UhObljlousj+8D0+QyJgC4LfdrNr9V90H6q8/S1uEFufo9Q829oGkhcgJ iSoSgZkVMtecaSwPnJDE5z4f8K+y4GSWWCrfVqQJAoJ1yg0At6rZckpovIffwUifyUEy Yhwg== X-Forwarded-Encrypted: i=1; AJvYcCUOiSXcx56LVwWgUyP4Ods06dHu5ZyT5YNfYlw2qQpNcqp3VgVVmwPsh0MqrFG6x9ZS0JHHbhqmpwnRr5w=@vger.kernel.org X-Gm-Message-State: AOJu0YyVLElx1rteIARLmtaAk9VQSSrukjy128diYqvf5Q8eau3DJ5bP Jzwb0ctkXi7MU3plISgiD7nVYFh6y9RUEfHVRP+AYjSpQEPcmOU1JIicFIL+35bJwN5ZRSCkejR eXZNBb9a0D/C+4pUYZPUiAEgUynR+9Wrbhf5Kk1Yk8ag9xEEN X-Gm-Gg: ASbGnctRsj3oniXPLiM2dkYKB9dSngRt76KbYpHgSas7qbfes26h+iXH1rhhorKcuqU R89adgkAv2lOwaAPRP0xJSw8bWSdq0edeiK8xeQ8ywySRR48Gb+5HMBqNUtKx0x17rwFHT9/3x0 MwWsaDM2yAYGeBHaF9t7K2r1QZcv/3aS1YJn+k6W5AydD4zp4ak8tIyz7wFdv+2BEh46/MRaMwG +FNZhU65wxG0FlYtYFrUjGuzBvbR/mfXvzieOwsOrDGXGIdzPqEFhpmDiQS31pZjupbu5xtwdXQ za+4N23hg0phlddiDrpzC0Pue106/w== X-Google-Smtp-Source: AGHT+IEulsOmKO4sgUil3cqmh9Xg2ydaQ8ijeOqIsJOY/rWdqDrzJZvwB2xMFq9zFifYJhH8Z0JN0UgN+RXx X-Received: by 2002:a17:902:d590:b0:21f:3e2d:7d43 with SMTP id d9443c01a7336-22dbf73b05dmr23686325ad.13.1745633276515; Fri, 25 Apr 2025 19:07:56 -0700 (PDT) Received: from c7-smtp-2023.dev.purestorage.com ([2620:125:9017:12:36:3:5:0]) by smtp-relay.gmail.com with ESMTPS id d9443c01a7336-22db4f4523csm2250405ad.69.2025.04.25.19.07.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 25 Apr 2025 19:07:56 -0700 (PDT) X-Relaying-Domain: purestorage.com Received: from dev-csander.dev.purestorage.com (dev-csander.dev.purestorage.com [10.7.70.37]) by c7-smtp-2023.dev.purestorage.com (Postfix) with ESMTP id D32F7340231; Fri, 25 Apr 2025 20:07:55 -0600 (MDT) Received: by dev-csander.dev.purestorage.com (Postfix, from userid 1557716354) id D0AABE41C66; Fri, 25 Apr 2025 20:07:55 -0600 (MDT) From: Caleb Sander Mateos To: Keith Busch , Jens Axboe , Christoph Hellwig , Sagi Grimberg , Andrew Morton Cc: Kanchan Joshi , linux-nvme@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Caleb Sander Mateos Subject: [PATCH v6 1/3] dmapool: add NUMA affinity support Date: Fri, 25 Apr 2025 20:06:34 -0600 Message-ID: <20250426020636.34355-2-csander@purestorage.com> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20250426020636.34355-1-csander@purestorage.com> References: <20250426020636.34355-1-csander@purestorage.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Keith Busch Introduce dma_pool_create_node(), like dma_pool_create() but taking an additional NUMA node argument. Allocate struct dma_pool on the desired node, and store the node on dma_pool for allocating struct dma_page. Make dma_pool_create() an alias for dma_pool_create_node() with node set to NUMA_NO_NODE. Signed-off-by: Keith Busch Signed-off-by: Caleb Sander Mateos Reviewed-by: Jens Axboe Reviewed-by: Sagi Grimberg Reviewed-by: John Garry Reviewed-by: Kanchan Joshi --- include/linux/dmapool.h | 17 +++++++++++++---- mm/dmapool.c | 16 ++++++++++------ 2 files changed, 23 insertions(+), 10 deletions(-) diff --git a/include/linux/dmapool.h b/include/linux/dmapool.h index f632ecfb4238..bbf1833a24f7 100644 --- a/include/linux/dmapool.h +++ b/include/linux/dmapool.h @@ -9,19 +9,20 @@ */ =20 #ifndef LINUX_DMAPOOL_H #define LINUX_DMAPOOL_H =20 +#include #include #include =20 struct device; =20 #ifdef CONFIG_HAS_DMA =20 -struct dma_pool *dma_pool_create(const char *name, struct device *dev,=20 - size_t size, size_t align, size_t allocation); +struct dma_pool *dma_pool_create_node(const char *name, struct device *dev, + size_t size, size_t align, size_t boundary, int node); =20 void dma_pool_destroy(struct dma_pool *pool); =20 void *dma_pool_alloc(struct dma_pool *pool, gfp_t mem_flags, dma_addr_t *handle); @@ -33,12 +34,13 @@ void dma_pool_free(struct dma_pool *pool, void *vaddr, = dma_addr_t addr); struct dma_pool *dmam_pool_create(const char *name, struct device *dev, size_t size, size_t align, size_t allocation); void dmam_pool_destroy(struct dma_pool *pool); =20 #else /* !CONFIG_HAS_DMA */ -static inline struct dma_pool *dma_pool_create(const char *name, - struct device *dev, size_t size, size_t align, size_t allocation) +static inline struct dma_pool *dma_pool_create_node(const char *name, + struct device *dev, size_t size, size_t align, size_t boundary, + int node) { return NULL; } static inline void dma_pool_destroy(struct dma_pool *pool) { } static inline void *dma_pool_alloc(struct dma_pool *pool, gfp_t mem_flags, dma_addr_t *handle) { return NULL; } static inline void dma_pool_free(struct dma_pool *pool, void *vaddr, @@ -47,10 +49,17 @@ static inline struct dma_pool *dmam_pool_create(const c= har *name, struct device *dev, size_t size, size_t align, size_t allocation) { return NULL; } static inline void dmam_pool_destroy(struct dma_pool *pool) { } #endif /* !CONFIG_HAS_DMA */ =20 +static inline struct dma_pool *dma_pool_create(const char *name, + struct device *dev, size_t size, size_t align, size_t boundary) +{ + return dma_pool_create_node(name, dev, size, align, boundary, + NUMA_NO_NODE); +} + static inline void *dma_pool_zalloc(struct dma_pool *pool, gfp_t mem_flags, dma_addr_t *handle) { return dma_pool_alloc(pool, mem_flags | __GFP_ZERO, handle); } diff --git a/mm/dmapool.c b/mm/dmapool.c index f0bfc6c490f4..4de531542814 100644 --- a/mm/dmapool.c +++ b/mm/dmapool.c @@ -54,10 +54,11 @@ struct dma_pool { /* the pool */ size_t nr_pages; struct device *dev; unsigned int size; unsigned int allocation; unsigned int boundary; + int node; char name[32]; struct list_head pools; }; =20 struct dma_page { /* cacheable header for 'allocation' bytes */ @@ -197,16 +198,17 @@ static void pool_block_push(struct dma_pool *pool, st= ruct dma_block *block, pool->next_block =3D block; } =20 =20 /** - * dma_pool_create - Creates a pool of consistent memory blocks, for dma. + * dma_pool_create_node - Creates a pool of consistent memory blocks, for = dma. * @name: name of pool, for diagnostics * @dev: device that will be doing the DMA * @size: size of the blocks in this pool. * @align: alignment requirement for blocks; must be a power of two * @boundary: returned blocks won't cross this power of two boundary + * @node: optional NUMA node to allocate structs 'dma_pool' and 'dma_page'= on * Context: not in_interrupt() * * Given one of these pools, dma_pool_alloc() * may be used to allocate memory. Such memory will all have "consistent" * DMA mappings, accessible by the device and its driver without using @@ -219,12 +221,13 @@ static void pool_block_push(struct dma_pool *pool, st= ruct dma_block *block, * boundaries of 4KBytes. * * Return: a dma allocation pool with the requested characteristics, or * %NULL if one can't be created. */ -struct dma_pool *dma_pool_create(const char *name, struct device *dev, - size_t size, size_t align, size_t boundary) +struct dma_pool *dma_pool_create_node(const char *name, struct device *dev, + size_t size, size_t align, size_t boundary, + int node) { struct dma_pool *retval; size_t allocation; bool empty; =20 @@ -249,11 +252,11 @@ struct dma_pool *dma_pool_create(const char *name, st= ruct device *dev, else if ((boundary < size) || (boundary & (boundary - 1))) return NULL; =20 boundary =3D min(boundary, allocation); =20 - retval =3D kzalloc(sizeof(*retval), GFP_KERNEL); + retval =3D kzalloc_node(sizeof(*retval), GFP_KERNEL, node); if (!retval) return retval; =20 strscpy(retval->name, name, sizeof(retval->name)); =20 @@ -262,10 +265,11 @@ struct dma_pool *dma_pool_create(const char *name, st= ruct device *dev, INIT_LIST_HEAD(&retval->page_list); spin_lock_init(&retval->lock); retval->size =3D size; retval->boundary =3D boundary; retval->allocation =3D allocation; + retval->node =3D node; INIT_LIST_HEAD(&retval->pools); =20 /* * pools_lock ensures that the ->dma_pools list does not get corrupted. * pools_reg_lock ensures that there is not a race between @@ -293,11 +297,11 @@ struct dma_pool *dma_pool_create(const char *name, st= ruct device *dev, } } mutex_unlock(&pools_reg_lock); return retval; } -EXPORT_SYMBOL(dma_pool_create); +EXPORT_SYMBOL(dma_pool_create_node); =20 static void pool_initialise_page(struct dma_pool *pool, struct dma_page *p= age) { unsigned int next_boundary =3D pool->boundary, offset =3D 0; struct dma_block *block, *first =3D NULL, *last =3D NULL; @@ -333,11 +337,11 @@ static void pool_initialise_page(struct dma_pool *poo= l, struct dma_page *page) =20 static struct dma_page *pool_alloc_page(struct dma_pool *pool, gfp_t mem_f= lags) { struct dma_page *page; =20 - page =3D kmalloc(sizeof(*page), mem_flags); + page =3D kmalloc_node(sizeof(*page), mem_flags, pool->node); if (!page) return NULL; =20 page->vaddr =3D dma_alloc_coherent(pool->dev, pool->allocation, &page->dma, mem_flags); --=20 2.45.2 From nobody Tue Dec 16 00:22:12 2025 Received: from mail-yw1-f225.google.com (mail-yw1-f225.google.com [209.85.128.225]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 794371E492D for ; Sat, 26 Apr 2025 02:07:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.225 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745633280; cv=none; b=gVosjXnE/syc4qjlZ5giN+qEHIThpFEms+EhuwpdaDwkPDS2SxxpZrW0i8GnkY8nGLgTo78VrGg458CXsAzkmaMtk0I2UWSsD9WTv+kGjuCLfbV2rtKzUuvg3fhcZD99HX0j/PAZ8xo27C+a9guLdTirIh+t3UHJWiX8YjBfjLs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745633280; c=relaxed/simple; bh=AG+F4FD3+gjOxZbs+B+zZMhk35jf+Em8KiZhK4YH074=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=WynHTpwMjGc9KcG2+OEhsR3ch0gVPps/MjIFEw8x64qXDMNFH4F8Fw4MUCHftR8zlvsFTYB80yWNpa9WNk4Jue5XAyf3C9Y59AWHedx1ZcvztiMMle6EQ4BC/h/M/n/x5yaxH1jUCgINoFiCvpcG38lqWqMDApC7eKkVP7MYhKg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=purestorage.com; spf=fail smtp.mailfrom=purestorage.com; dkim=pass (2048-bit key) header.d=purestorage.com header.i=@purestorage.com header.b=Q3EZYsqe; arc=none smtp.client-ip=209.85.128.225 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=purestorage.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=purestorage.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=purestorage.com header.i=@purestorage.com header.b="Q3EZYsqe" Received: by mail-yw1-f225.google.com with SMTP id 00721157ae682-6ff5696f99dso1143387b3.2 for ; Fri, 25 Apr 2025 19:07:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=purestorage.com; s=google2022; t=1745633277; x=1746238077; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=/wVUCLEe+0hKn6q/c+4Hvm6QrjTginZMMcE5JFRhMDE=; b=Q3EZYsqeD9LGERD3725jYxhaaikmNCO0sMRmOFQu1oBEDqnbg8ttU+PozJctcqq5hU JtiioXwTvplZ/rtD6lWwfBCPhBIdJK9Zo0jr88YXikPifWo/qFHfmye5WUe9InV93JiR sbA21eUaxiR2rI6BCIzPz6L/bCpsWN6bFy/lpq6RYaGMzrDNPHt/5qZ0qP2ELcHDez74 cu+oXL1PTciQaYYUUbTZ6DTxS5lIGAL5dQBUm2GSKCZ5ijpt14BEUrVDRvj+Z4zdqeg2 vJeZ6oNnyb48MdeGGUV2w/0s1tCq4tMBOWlFW240xX8prJU6D6jXkABtjEGXIHSqiThK o6Kg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1745633277; x=1746238077; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=/wVUCLEe+0hKn6q/c+4Hvm6QrjTginZMMcE5JFRhMDE=; b=qrNIu+JskyUNJyVsHlHWz4PMRjb7LiCdnTuM38IyUcaGceimMqjAYY6IxYz6B7GFAF wXkKzwci6siNRf2h8325o+p385Vyjs7KDUfr3TpziDMoUrLuocVuIkZckHf6a3wYvHtc Y3BLP9x2Xm4g0xv5gHHPa3hsS+xC7Ch97XH5AIrPEfyCEyS4xCT5ZnRNox5EckgIHtBC ZGtlUzy/uNXnx1B+6xu7+gCKkA/BTAxKQMzZmNnHPQJqpbyw0V4ZUcSYz4WqeUUBahdT //cCxCrwC46AWaJppbVypF6cXhB3w6Henxb0r65c7FnqA0yAsGf35kRoadDc6lg4D7ps DPwA== X-Forwarded-Encrypted: i=1; AJvYcCW4ObPEm9wA6vPfMQukkc3EUfLwflfkxNcNUoeoSjEXof6oIAMojoZ9PNDfKm/knidIcYjN+MxGgKGxkJM=@vger.kernel.org X-Gm-Message-State: AOJu0Ywv4LOz0YEGjW6BOYRCrLpil9KSKZH//FeyzSoB+aGH70LummaA CKnqVffwD8wCvR93pfZu4sk7lqqKt4X951f9nO71iMjz1uWHIPjTyiooV36hT7rWBk/D49ZdXp+ gpH/OYnNvOcgQj5BneTw/vrt1Mo++OzCkiAYx9E436AOqXqkA X-Gm-Gg: ASbGncujshgMZ7+hQ7qRKen7ONEhHq02zaAZ9g7mz24OrtEXk8DGC4Yzazb17fS+OVL gRcaI4XlOMXk6D+KVWOMed/tR1tumzpkQ+ymoJLZEnAYV8uWmrs/EDS0y5zhcB9o9bgyfGmOslO 0b0tTgTBub+kGFMlFOLf2rIke9Acd0IXlgvMSYkpap/cFqr/bHoJZqahZa67a5qcAF58SXgKXKq l0z1j2N3uYh858kIKb5vIB6up5ybFC/FL3Nc5o7QggeuOy8IsWPQhB6Bf+B6Gy/1gZojyonorEx CGMcPB5kEGX6YTJdUaiOm3BUDT1aOg== X-Google-Smtp-Source: AGHT+IEOm9b947bh8wKZJPV8BjTdH67+qjOLV0pvz9iI2xoiPw02tnrnJGVgx5zAiKMvH1se85vcy5rkCyEq X-Received: by 2002:a05:690c:10c:b0:703:b507:de77 with SMTP id 00721157ae682-708540e3d7amr28404067b3.3.1745633277254; Fri, 25 Apr 2025 19:07:57 -0700 (PDT) Received: from c7-smtp-2023.dev.purestorage.com ([2620:125:9017:12:36:3:5:0]) by smtp-relay.gmail.com with ESMTPS id 00721157ae682-7084195d63fsm2366547b3.17.2025.04.25.19.07.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 25 Apr 2025 19:07:57 -0700 (PDT) X-Relaying-Domain: purestorage.com Received: from dev-csander.dev.purestorage.com (dev-csander.dev.purestorage.com [10.7.70.37]) by c7-smtp-2023.dev.purestorage.com (Postfix) with ESMTP id A07E7340269; Fri, 25 Apr 2025 20:07:56 -0600 (MDT) Received: by dev-csander.dev.purestorage.com (Postfix, from userid 1557716354) id 9E77BE41C66; Fri, 25 Apr 2025 20:07:56 -0600 (MDT) From: Caleb Sander Mateos To: Keith Busch , Jens Axboe , Christoph Hellwig , Sagi Grimberg , Andrew Morton Cc: Kanchan Joshi , linux-nvme@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Caleb Sander Mateos Subject: [PATCH v6 2/3] nvme/pci: factor out nvme_init_hctx() helper Date: Fri, 25 Apr 2025 20:06:35 -0600 Message-ID: <20250426020636.34355-3-csander@purestorage.com> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20250426020636.34355-1-csander@purestorage.com> References: <20250426020636.34355-1-csander@purestorage.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" nvme_init_hctx() and nvme_admin_init_hctx() are very similar. In preparation for adding more logic, factor out a nvme_init_hctx() helper. Rename the old nvme_init_hctx() to nvme_io_init_hctx(). Signed-off-by: Caleb Sander Mateos Reviewed-by: Kanchan Joshi Reviewed-by: Sagi Grimberg Reviewed-by: Keith Busch Reviewed-by: Jens Axboe --- drivers/nvme/host/pci.c | 29 +++++++++++++++-------------- 1 file changed, 15 insertions(+), 14 deletions(-) diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index b178d52eac1b..642890ddada5 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -395,32 +395,33 @@ static int nvme_pci_npages_prp(void) unsigned max_bytes =3D (NVME_MAX_KB_SZ * 1024) + NVME_CTRL_PAGE_SIZE; unsigned nprps =3D DIV_ROUND_UP(max_bytes, NVME_CTRL_PAGE_SIZE); return DIV_ROUND_UP(8 * nprps, NVME_CTRL_PAGE_SIZE - 8); } =20 -static int nvme_admin_init_hctx(struct blk_mq_hw_ctx *hctx, void *data, - unsigned int hctx_idx) +static int nvme_init_hctx(struct blk_mq_hw_ctx *hctx, void *data, unsigned= qid) { struct nvme_dev *dev =3D to_nvme_dev(data); - struct nvme_queue *nvmeq =3D &dev->queues[0]; - - WARN_ON(hctx_idx !=3D 0); - WARN_ON(dev->admin_tagset.tags[0] !=3D hctx->tags); + struct nvme_queue *nvmeq =3D &dev->queues[qid]; + struct blk_mq_tags *tags; =20 + tags =3D qid ? dev->tagset.tags[qid - 1] : dev->admin_tagset.tags[0]; + WARN_ON(tags !=3D hctx->tags); hctx->driver_data =3D nvmeq; return 0; } =20 -static int nvme_init_hctx(struct blk_mq_hw_ctx *hctx, void *data, - unsigned int hctx_idx) +static int nvme_admin_init_hctx(struct blk_mq_hw_ctx *hctx, void *data, + unsigned int hctx_idx) { - struct nvme_dev *dev =3D to_nvme_dev(data); - struct nvme_queue *nvmeq =3D &dev->queues[hctx_idx + 1]; + WARN_ON(hctx_idx !=3D 0); + return nvme_init_hctx(hctx, data, 0); +} =20 - WARN_ON(dev->tagset.tags[hctx_idx] !=3D hctx->tags); - hctx->driver_data =3D nvmeq; - return 0; +static int nvme_io_init_hctx(struct blk_mq_hw_ctx *hctx, void *data, + unsigned int hctx_idx) +{ + return nvme_init_hctx(hctx, data, hctx_idx + 1); } =20 static int nvme_pci_init_request(struct blk_mq_tag_set *set, struct request *req, unsigned int hctx_idx, unsigned int numa_node) @@ -1813,11 +1814,11 @@ static const struct blk_mq_ops nvme_mq_admin_ops = =3D { static const struct blk_mq_ops nvme_mq_ops =3D { .queue_rq =3D nvme_queue_rq, .queue_rqs =3D nvme_queue_rqs, .complete =3D nvme_pci_complete_rq, .commit_rqs =3D nvme_commit_rqs, - .init_hctx =3D nvme_init_hctx, + .init_hctx =3D nvme_io_init_hctx, .init_request =3D nvme_pci_init_request, .map_queues =3D nvme_pci_map_queues, .timeout =3D nvme_timeout, .poll =3D nvme_poll, }; --=20 2.45.2 From nobody Tue Dec 16 00:22:12 2025 Received: from mail-pj1-f100.google.com (mail-pj1-f100.google.com [209.85.216.100]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1D9A61F7580 for ; Sat, 26 Apr 2025 02:07:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.100 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745633281; cv=none; b=p8JLrR56i8pzc/P+rp9OmxzNUzq0aK5eOOs4O+Ia4ouMU5LVwHufo5aRVfQAt/fA40jwLMHQzv5Kz+StiaQw2EklRV3bAQGAsP0HF7jVCqLC7rMlUZwOnegIux0R+Pl4pZr6lULaWth0SeND5Vq/jmwsp3Oe/PZtTaoAlTc+tBA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745633281; c=relaxed/simple; bh=+lp+8wVQ/BprkI5+KU9Cnbdffd83AUyQfjQTsITQmSM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=D7Qbw5gLGfXo5gD2C/j5jM6Q07/zLaAPoKDu4+eNwsB1/pZSbpzoQe21Z5CGauwhjE3u9fxGJX0qTwOYy738TVU3yWCw7/NQV+oYonRunGKUjrAYWVi8osd42SZSpKDovoOjftU6tHKCvl202Lb/195iAv4d6M0KuTcd0rF7K9s= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=purestorage.com; spf=fail smtp.mailfrom=purestorage.com; dkim=pass (2048-bit key) header.d=purestorage.com header.i=@purestorage.com header.b=H7WmF7g6; arc=none smtp.client-ip=209.85.216.100 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=purestorage.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=purestorage.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=purestorage.com header.i=@purestorage.com header.b="H7WmF7g6" Received: by mail-pj1-f100.google.com with SMTP id 98e67ed59e1d1-3087a70557bso469482a91.2 for ; Fri, 25 Apr 2025 19:07:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=purestorage.com; s=google2022; t=1745633278; x=1746238078; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=YtryNti4mNb9Vw9cFkGEgtmMWf9l9msvOakHCG6EasA=; b=H7WmF7g6gxiT/dm7tfVwnQss+MoKAUC0Y4xevfvF6hiEAi1boTjcwoj8hGcN60MjtE vimVmW9NUAgC6Q7iqy7ovW5QA4gtqwTkr0PNmH8SITYgmRWDB0rT7OiWY30qzJQ/IO9J xzu+Ss/IFW94P6+2rMTXpEMJ2pTxH94NXN+qAje/80YItuugRSBYl07zA0eG9PiMoak3 OI3NInlQ0qT6eQmsxxAyUNMWZYvm2ccbX6DAHnrxAcdPJMb/7C7COxCNLL8uEQeZAK/+ e4WaYgwT4dvt0Pr/e2YbIBMEFXJ1VskMhvSOu1zuaGY8TOtFUkv6Fr7hpFkFzC9Qwdl2 dQNg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1745633278; x=1746238078; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=YtryNti4mNb9Vw9cFkGEgtmMWf9l9msvOakHCG6EasA=; b=owcCdQ75sYbcp77r97kahiQs6yprkcZ/9Gg+TXicYD3LNNFa5P1D4M/9k9xx7yfnoc +LaodeGiWpdKYODm47LnfSRX0x7m6Vol4kg4JW1X2JE+tObP7gOiytvIV4uQVZzVViYn Ts6aOemC6eOCDmeXyMLFI5QUD3shjYyg97DilgmoslWJcFauiIUAeJ9TdWjV4PFpqqqc cgjKPc7DJbG8MFRozQuUjFZUyarAmEFNEh1sr/MQLAK1ruWgGz2D92Ohqqhr00oGjUau WCab8brxk4LDzbBQq4+kOswj26ppW02J28TYknQ0+4yFtWlnoh9/kuOK+1diq9lNfkBJ Gr9g== X-Forwarded-Encrypted: i=1; AJvYcCWWAKZv+j+Hw9wrXR6zREOCDPbDs2YsPvOapaimtKI5rGzmto/UIXYizwwGnKW+El8Z8cJWB65UC3ejpys=@vger.kernel.org X-Gm-Message-State: AOJu0YzkPJ0X6WGXF1TIp8eVA+T6bEoXC0Pb6klu5FfgVAcOFKkJSuew xHH/OeycQ+WiH2uUh33u3PX2xdL1utu7f5xyv4TcRSByo9Jxvt4izR+O7vScJI1xb8v4UbBQMGr dBEuA4izhwnTnV+lFKedSDepKJOrJ/joG22bsBq3pD3jF9xhO X-Gm-Gg: ASbGncsBMvg8vBiwwtfb6q1dzQz6RsPJ3cYFW0XcweMKgNA5XllSUFdNQifR7JXSTm+ blG+TRxir2aXYiCUZL+ERc8MA+Jn8GjHthO28mduCFGOwpE0qG5F+Nty9r+dyZxIN/WjRatOdvF U4H1K2HxHxCtRmlJopqbup6z8GDCHkrHBfiFSNOWfgCIHXQjhhpgwLhEowkYKNWBlyzlTeg/Tzj kUM+7MDPnUTMxjQ6NdrleGFOTS0Yx09OEFj5+g2jFQTmWVIKNaEvRJ+X09ZKGgO+/rAVl5v+QmM vJ4LCa/iyo3hupt8YDxhTvmm5JynSA== X-Google-Smtp-Source: AGHT+IGQw/DW4cFmCJsmHjYH9tz1RQXJliW+ZULdfddljyIF3JXD9G+xXh7u+PeZ37ioicNVEnunHxNSbQbH X-Received: by 2002:a17:90b:1a8a:b0:305:5f25:59ad with SMTP id 98e67ed59e1d1-309f7eb325amr2191425a91.7.1745633277977; Fri, 25 Apr 2025 19:07:57 -0700 (PDT) Received: from c7-smtp-2023.dev.purestorage.com ([2620:125:9017:12:36:3:5:0]) by smtp-relay.gmail.com with ESMTPS id d9443c01a7336-22db5221459sm2178125ad.121.2025.04.25.19.07.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 25 Apr 2025 19:07:57 -0700 (PDT) X-Relaying-Domain: purestorage.com Received: from dev-csander.dev.purestorage.com (unknown [IPv6:2620:125:9007:640:ffff::418a]) by c7-smtp-2023.dev.purestorage.com (Postfix) with ESMTP id 51E7A340231; Fri, 25 Apr 2025 20:07:57 -0600 (MDT) Received: by dev-csander.dev.purestorage.com (Postfix, from userid 1557716354) id 4F72FE41C66; Fri, 25 Apr 2025 20:07:57 -0600 (MDT) From: Caleb Sander Mateos To: Keith Busch , Jens Axboe , Christoph Hellwig , Sagi Grimberg , Andrew Morton Cc: Kanchan Joshi , linux-nvme@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Caleb Sander Mateos Subject: [PATCH v6 3/3] nvme/pci: make PRP list DMA pools per-NUMA-node Date: Fri, 25 Apr 2025 20:06:36 -0600 Message-ID: <20250426020636.34355-4-csander@purestorage.com> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20250426020636.34355-1-csander@purestorage.com> References: <20250426020636.34355-1-csander@purestorage.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" NVMe commands with over 8 KB of discontiguous data allocate PRP list pages from the per-nvme_device dma_pool prp_page_pool or prp_small_pool. Each call to dma_pool_alloc() and dma_pool_free() takes the per-dma_pool spinlock. These device-global spinlocks are a significant source of contention when many CPUs are submitting to the same NVMe devices. On a workload issuing 32 KB reads from 16 CPUs (8 hypertwin pairs) across 2 NUMA nodes to 23 NVMe devices, we observed 2.4% of CPU time spent in _raw_spin_lock_irqsave called from dma_pool_alloc and dma_pool_free. Ideally, the dma_pools would be per-hctx to minimize contention. But that could impose considerable resource costs in a system with many NVMe devices and CPUs. As a compromise, allocate per-NUMA-node PRP list DMA pools. Map each nvme_queue to the set of DMA pools corresponding to its device and its hctx's NUMA node. This reduces the _raw_spin_lock_irqsave overhead by about half, to 1.2%. Preventing the sharing of PRP list pages across NUMA nodes also makes them cheaper to initialize. Link: https://lore.kernel.org/linux-nvme/CADUfDZqa=3DOOTtTTznXRDmBQo1WrFcDw= 1hBA7XwM7hzJ-hpckcA@mail.gmail.com/T/#u Signed-off-by: Caleb Sander Mateos Reviewed-by: Kanchan Joshi Reviewed-by: Sagi Grimberg Reviewed-by: Keith Busch Reviewed-by: Jens Axboe --- drivers/nvme/host/pci.c | 144 +++++++++++++++++++++++----------------- 1 file changed, 84 insertions(+), 60 deletions(-) diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index 642890ddada5..2c554bb7f984 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -16,10 +16,11 @@ #include #include #include #include #include +#include #include #include #include #include #include @@ -110,21 +111,24 @@ struct nvme_queue; =20 static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown); static void nvme_delete_io_queues(struct nvme_dev *dev); static void nvme_update_attrs(struct nvme_dev *dev); =20 +struct nvme_prp_dma_pools { + struct dma_pool *large; + struct dma_pool *small; +}; + /* * Represents an NVM Express device. Each nvme_dev is a PCI function. */ struct nvme_dev { struct nvme_queue *queues; struct blk_mq_tag_set tagset; struct blk_mq_tag_set admin_tagset; u32 __iomem *dbs; struct device *dev; - struct dma_pool *prp_page_pool; - struct dma_pool *prp_small_pool; unsigned online_queues; unsigned max_qid; unsigned io_queues[HCTX_MAX_TYPES]; unsigned int num_vecs; u32 q_depth; @@ -160,10 +164,11 @@ struct nvme_dev { struct nvme_host_mem_buf_desc *host_mem_descs; void **host_mem_desc_bufs; unsigned int nr_allocated_queues; unsigned int nr_write_queues; unsigned int nr_poll_queues; + struct nvme_prp_dma_pools prp_pools[]; }; =20 static int io_queue_depth_set(const char *val, const struct kernel_param *= kp) { return param_set_uint_minmax(val, kp, NVME_PCI_MIN_QUEUE_SIZE, @@ -189,10 +194,11 @@ static inline struct nvme_dev *to_nvme_dev(struct nvm= e_ctrl *ctrl) * An NVM Express queue. Each device has at least two (one for admin * commands and one for I/O commands). */ struct nvme_queue { struct nvme_dev *dev; + struct nvme_prp_dma_pools prp_pools; spinlock_t sq_lock; void *sq_cmds; /* only used for poll queues: */ spinlock_t cq_poll_lock ____cacheline_aligned_in_smp; struct nvme_completion *cqes; @@ -395,18 +401,67 @@ static int nvme_pci_npages_prp(void) unsigned max_bytes =3D (NVME_MAX_KB_SZ * 1024) + NVME_CTRL_PAGE_SIZE; unsigned nprps =3D DIV_ROUND_UP(max_bytes, NVME_CTRL_PAGE_SIZE); return DIV_ROUND_UP(8 * nprps, NVME_CTRL_PAGE_SIZE - 8); } =20 +static struct nvme_prp_dma_pools * +nvme_setup_prp_pools(struct nvme_dev *dev, unsigned numa_node) +{ + struct nvme_prp_dma_pools *prp_pools =3D &dev->prp_pools[numa_node]; + size_t small_align =3D 256; + + if (prp_pools->small) + return prp_pools; /* already initialized */ + + prp_pools->large =3D dma_pool_create_node("prp list page", dev->dev, + NVME_CTRL_PAGE_SIZE, + NVME_CTRL_PAGE_SIZE, 0, + numa_node); + if (!prp_pools->large) + return ERR_PTR(-ENOMEM); + + if (dev->ctrl.quirks & NVME_QUIRK_DMAPOOL_ALIGN_512) + small_align =3D 512; + + /* Optimisation for I/Os between 4k and 128k */ + prp_pools->small =3D dma_pool_create_node("prp list 256", dev->dev, + 256, small_align, 0, numa_node); + if (!prp_pools->small) { + dma_pool_destroy(prp_pools->large); + prp_pools->large =3D NULL; + return ERR_PTR(-ENOMEM); + } + + return prp_pools; +} + +static void nvme_release_prp_pools(struct nvme_dev *dev) +{ + unsigned i; + + for (i =3D 0; i < nr_node_ids; i++) { + struct nvme_prp_dma_pools *prp_pools =3D &dev->prp_pools[i]; + + dma_pool_destroy(prp_pools->large); + dma_pool_destroy(prp_pools->small); + } +} + static int nvme_init_hctx(struct blk_mq_hw_ctx *hctx, void *data, unsigned= qid) { struct nvme_dev *dev =3D to_nvme_dev(data); struct nvme_queue *nvmeq =3D &dev->queues[qid]; + struct nvme_prp_dma_pools *prp_pools; struct blk_mq_tags *tags; =20 tags =3D qid ? dev->tagset.tags[qid - 1] : dev->admin_tagset.tags[0]; WARN_ON(tags !=3D hctx->tags); + prp_pools =3D nvme_setup_prp_pools(dev, hctx->numa_node); + if (IS_ERR(prp_pools)) + return PTR_ERR(prp_pools); + + nvmeq->prp_pools =3D *prp_pools; hctx->driver_data =3D nvmeq; return 0; } =20 static int nvme_admin_init_hctx(struct blk_mq_hw_ctx *hctx, void *data, @@ -536,27 +591,28 @@ static inline bool nvme_pci_use_sgls(struct nvme_dev = *dev, struct request *req, if (!sgl_threshold || avg_seg_size < sgl_threshold) return nvme_req(req)->flags & NVME_REQ_USERCMD; return true; } =20 -static void nvme_free_prps(struct nvme_dev *dev, struct request *req) +static void nvme_free_prps(struct nvme_queue *nvmeq, struct request *req) { const int last_prp =3D NVME_CTRL_PAGE_SIZE / sizeof(__le64) - 1; struct nvme_iod *iod =3D blk_mq_rq_to_pdu(req); dma_addr_t dma_addr =3D iod->first_dma; int i; =20 for (i =3D 0; i < iod->nr_allocations; i++) { __le64 *prp_list =3D iod->list[i].prp_list; dma_addr_t next_dma_addr =3D le64_to_cpu(prp_list[last_prp]); =20 - dma_pool_free(dev->prp_page_pool, prp_list, dma_addr); + dma_pool_free(nvmeq->prp_pools.large, prp_list, dma_addr); dma_addr =3D next_dma_addr; } } =20 -static void nvme_unmap_data(struct nvme_dev *dev, struct request *req) +static void nvme_unmap_data(struct nvme_dev *dev, struct nvme_queue *nvmeq, + struct request *req) { struct nvme_iod *iod =3D blk_mq_rq_to_pdu(req); =20 if (iod->dma_len) { dma_unmap_page(dev->dev, iod->first_dma, iod->dma_len, @@ -567,17 +623,17 @@ static void nvme_unmap_data(struct nvme_dev *dev, str= uct request *req) WARN_ON_ONCE(!iod->sgt.nents); =20 dma_unmap_sgtable(dev->dev, &iod->sgt, rq_dma_dir(req), 0); =20 if (iod->nr_allocations =3D=3D 0) - dma_pool_free(dev->prp_small_pool, iod->list[0].sg_list, + dma_pool_free(nvmeq->prp_pools.small, iod->list[0].sg_list, iod->first_dma); else if (iod->nr_allocations =3D=3D 1) - dma_pool_free(dev->prp_page_pool, iod->list[0].sg_list, + dma_pool_free(nvmeq->prp_pools.large, iod->list[0].sg_list, iod->first_dma); else - nvme_free_prps(dev, req); + nvme_free_prps(nvmeq, req); mempool_free(iod->sgt.sgl, dev->iod_mempool); } =20 static void nvme_print_sgl(struct scatterlist *sgl, int nents) { @@ -591,11 +647,11 @@ static void nvme_print_sgl(struct scatterlist *sgl, i= nt nents) i, &phys, sg->offset, sg->length, &sg_dma_address(sg), sg_dma_len(sg)); } } =20 -static blk_status_t nvme_pci_setup_prps(struct nvme_dev *dev, +static blk_status_t nvme_pci_setup_prps(struct nvme_queue *nvmeq, struct request *req, struct nvme_rw_command *cmnd) { struct nvme_iod *iod =3D blk_mq_rq_to_pdu(req); struct dma_pool *pool; int length =3D blk_rq_payload_bytes(req); @@ -627,14 +683,14 @@ static blk_status_t nvme_pci_setup_prps(struct nvme_d= ev *dev, goto done; } =20 nprps =3D DIV_ROUND_UP(length, NVME_CTRL_PAGE_SIZE); if (nprps <=3D (256 / 8)) { - pool =3D dev->prp_small_pool; + pool =3D nvmeq->prp_pools.small; iod->nr_allocations =3D 0; } else { - pool =3D dev->prp_page_pool; + pool =3D nvmeq->prp_pools.large; iod->nr_allocations =3D 1; } =20 prp_list =3D dma_pool_alloc(pool, GFP_ATOMIC, &prp_dma); if (!prp_list) { @@ -672,11 +728,11 @@ static blk_status_t nvme_pci_setup_prps(struct nvme_d= ev *dev, done: cmnd->dptr.prp1 =3D cpu_to_le64(sg_dma_address(iod->sgt.sgl)); cmnd->dptr.prp2 =3D cpu_to_le64(iod->first_dma); return BLK_STS_OK; free_prps: - nvme_free_prps(dev, req); + nvme_free_prps(nvmeq, req); return BLK_STS_RESOURCE; bad_sgl: WARN(DO_ONCE(nvme_print_sgl, iod->sgt.sgl, iod->sgt.nents), "Invalid SGL for payload:%d nents:%d\n", blk_rq_payload_bytes(req), iod->sgt.nents); @@ -697,11 +753,11 @@ static void nvme_pci_sgl_set_seg(struct nvme_sgl_desc= *sge, sge->addr =3D cpu_to_le64(dma_addr); sge->length =3D cpu_to_le32(entries * sizeof(*sge)); sge->type =3D NVME_SGL_FMT_LAST_SEG_DESC << 4; } =20 -static blk_status_t nvme_pci_setup_sgls(struct nvme_dev *dev, +static blk_status_t nvme_pci_setup_sgls(struct nvme_queue *nvmeq, struct request *req, struct nvme_rw_command *cmd) { struct nvme_iod *iod =3D blk_mq_rq_to_pdu(req); struct dma_pool *pool; struct nvme_sgl_desc *sg_list; @@ -717,14 +773,14 @@ static blk_status_t nvme_pci_setup_sgls(struct nvme_d= ev *dev, nvme_pci_sgl_set_data(&cmd->dptr.sgl, sg); return BLK_STS_OK; } =20 if (entries <=3D (256 / sizeof(struct nvme_sgl_desc))) { - pool =3D dev->prp_small_pool; + pool =3D nvmeq->prp_pools.small; iod->nr_allocations =3D 0; } else { - pool =3D dev->prp_page_pool; + pool =3D nvmeq->prp_pools.large; iod->nr_allocations =3D 1; } =20 sg_list =3D dma_pool_alloc(pool, GFP_ATOMIC, &sgl_dma); if (!sg_list) { @@ -784,16 +840,16 @@ static blk_status_t nvme_setup_sgl_simple(struct nvme= _dev *dev, } =20 static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *re= q, struct nvme_command *cmnd) { + struct nvme_queue *nvmeq =3D req->mq_hctx->driver_data; struct nvme_iod *iod =3D blk_mq_rq_to_pdu(req); blk_status_t ret =3D BLK_STS_RESOURCE; int rc; =20 if (blk_rq_nr_phys_segments(req) =3D=3D 1) { - struct nvme_queue *nvmeq =3D req->mq_hctx->driver_data; struct bio_vec bv =3D req_bvec(req); =20 if (!is_pci_p2pdma_page(bv.bv_page)) { if (!nvme_pci_metadata_use_sgls(dev, req) && (bv.bv_offset & (NVME_CTRL_PAGE_SIZE - 1)) + @@ -824,13 +880,13 @@ static blk_status_t nvme_map_data(struct nvme_dev *de= v, struct request *req, ret =3D BLK_STS_TARGET; goto out_free_sg; } =20 if (nvme_pci_use_sgls(dev, req, iod->sgt.nents)) - ret =3D nvme_pci_setup_sgls(dev, req, &cmnd->rw); + ret =3D nvme_pci_setup_sgls(nvmeq, req, &cmnd->rw); else - ret =3D nvme_pci_setup_prps(dev, req, &cmnd->rw); + ret =3D nvme_pci_setup_prps(nvmeq, req, &cmnd->rw); if (ret !=3D BLK_STS_OK) goto out_unmap_sg; return BLK_STS_OK; =20 out_unmap_sg: @@ -841,10 +897,11 @@ static blk_status_t nvme_map_data(struct nvme_dev *de= v, struct request *req, } =20 static blk_status_t nvme_pci_setup_meta_sgls(struct nvme_dev *dev, struct request *req) { + struct nvme_queue *nvmeq =3D req->mq_hctx->driver_data; struct nvme_iod *iod =3D blk_mq_rq_to_pdu(req); struct nvme_rw_command *cmnd =3D &iod->cmd.rw; struct nvme_sgl_desc *sg_list; struct scatterlist *sgl, *sg; unsigned int entries; @@ -864,11 +921,11 @@ static blk_status_t nvme_pci_setup_meta_sgls(struct n= vme_dev *dev, rc =3D dma_map_sgtable(dev->dev, &iod->meta_sgt, rq_dma_dir(req), DMA_ATTR_NO_WARN); if (rc) goto out_free_sg; =20 - sg_list =3D dma_pool_alloc(dev->prp_small_pool, GFP_ATOMIC, &sgl_dma); + sg_list =3D dma_pool_alloc(nvmeq->prp_pools.small, GFP_ATOMIC, &sgl_dma); if (!sg_list) goto out_unmap_sg; =20 entries =3D iod->meta_sgt.nents; iod->meta_list.sg_list =3D sg_list; @@ -946,11 +1003,11 @@ static blk_status_t nvme_prep_rq(struct nvme_dev *de= v, struct request *req) =20 nvme_start_request(req); return BLK_STS_OK; out_unmap_data: if (blk_rq_nr_phys_segments(req)) - nvme_unmap_data(dev, req); + nvme_unmap_data(dev, req->mq_hctx->driver_data, req); out_free_cmd: nvme_cleanup_cmd(req); return ret; } =20 @@ -1036,10 +1093,11 @@ static void nvme_queue_rqs(struct rq_list *rqlist) nvme_submit_cmds(nvmeq, &submit_list); *rqlist =3D requeue_list; } =20 static __always_inline void nvme_unmap_metadata(struct nvme_dev *dev, + struct nvme_queue *nvmeq, struct request *req) { struct nvme_iod *iod =3D blk_mq_rq_to_pdu(req); =20 if (!iod->meta_sgt.nents) { @@ -1047,11 +1105,11 @@ static __always_inline void nvme_unmap_metadata(str= uct nvme_dev *dev, rq_integrity_vec(req).bv_len, rq_dma_dir(req)); return; } =20 - dma_pool_free(dev->prp_small_pool, iod->meta_list.sg_list, + dma_pool_free(nvmeq->prp_pools.small, iod->meta_list.sg_list, iod->meta_dma); dma_unmap_sgtable(dev->dev, &iod->meta_sgt, rq_dma_dir(req), 0); mempool_free(iod->meta_sgt.sgl, dev->iod_meta_mempool); } =20 @@ -1059,14 +1117,14 @@ static __always_inline void nvme_pci_unmap_rq(struc= t request *req) { struct nvme_queue *nvmeq =3D req->mq_hctx->driver_data; struct nvme_dev *dev =3D nvmeq->dev; =20 if (blk_integrity_rq(req)) - nvme_unmap_metadata(dev, req); + nvme_unmap_metadata(dev, nvmeq, req); =20 if (blk_rq_nr_phys_segments(req)) - nvme_unmap_data(dev, req); + nvme_unmap_data(dev, nvmeq, req); } =20 static void nvme_pci_complete_rq(struct request *req) { nvme_pci_unmap_rq(req); @@ -2839,39 +2897,10 @@ static int nvme_disable_prepare_reset(struct nvme_d= ev *dev, bool shutdown) return -EBUSY; nvme_dev_disable(dev, shutdown); return 0; } =20 -static int nvme_setup_prp_pools(struct nvme_dev *dev) -{ - size_t small_align =3D 256; - - dev->prp_page_pool =3D dma_pool_create("prp list page", dev->dev, - NVME_CTRL_PAGE_SIZE, - NVME_CTRL_PAGE_SIZE, 0); - if (!dev->prp_page_pool) - return -ENOMEM; - - if (dev->ctrl.quirks & NVME_QUIRK_DMAPOOL_ALIGN_512) - small_align =3D 512; - - /* Optimisation for I/Os between 4k and 128k */ - dev->prp_small_pool =3D dma_pool_create("prp list 256", dev->dev, - 256, small_align, 0); - if (!dev->prp_small_pool) { - dma_pool_destroy(dev->prp_page_pool); - return -ENOMEM; - } - return 0; -} - -static void nvme_release_prp_pools(struct nvme_dev *dev) -{ - dma_pool_destroy(dev->prp_page_pool); - dma_pool_destroy(dev->prp_small_pool); -} - static int nvme_pci_alloc_iod_mempool(struct nvme_dev *dev) { size_t meta_size =3D sizeof(struct scatterlist) * (NVME_MAX_META_SEGS + 1= ); size_t alloc_size =3D sizeof(struct scatterlist) * NVME_MAX_SEGS; =20 @@ -3182,11 +3211,12 @@ static struct nvme_dev *nvme_pci_alloc_dev(struct p= ci_dev *pdev, unsigned long quirks =3D id->driver_data; int node =3D dev_to_node(&pdev->dev); struct nvme_dev *dev; int ret =3D -ENOMEM; =20 - dev =3D kzalloc_node(sizeof(*dev), GFP_KERNEL, node); + dev =3D kzalloc_node(sizeof(*dev) + nr_node_ids * sizeof(*dev->prp_pools), + GFP_KERNEL, node); if (!dev) return ERR_PTR(-ENOMEM); INIT_WORK(&dev->ctrl.reset_work, nvme_reset_work); mutex_init(&dev->shutdown_lock); =20 @@ -3257,17 +3287,13 @@ static int nvme_probe(struct pci_dev *pdev, const s= truct pci_device_id *id) =20 result =3D nvme_dev_map(dev); if (result) goto out_uninit_ctrl; =20 - result =3D nvme_setup_prp_pools(dev); - if (result) - goto out_dev_unmap; - result =3D nvme_pci_alloc_iod_mempool(dev); if (result) - goto out_release_prp_pools; + goto out_dev_unmap; =20 dev_info(dev->ctrl.device, "pci function %s\n", dev_name(&pdev->dev)); =20 result =3D nvme_pci_enable(dev); if (result) @@ -3339,12 +3365,10 @@ static int nvme_probe(struct pci_dev *pdev, const s= truct pci_device_id *id) nvme_dbbuf_dma_free(dev); nvme_free_queues(dev, 0); out_release_iod_mempool: mempool_destroy(dev->iod_mempool); mempool_destroy(dev->iod_meta_mempool); -out_release_prp_pools: - nvme_release_prp_pools(dev); out_dev_unmap: nvme_dev_unmap(dev); out_uninit_ctrl: nvme_uninit_ctrl(&dev->ctrl); out_put_ctrl: --=20 2.45.2