From nobody Wed Oct 1 21:27:11 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2CF6024167F for ; Wed, 1 Oct 2025 06:34:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759300470; cv=none; b=tdAGpyAXQBpBSW3ymVBknmLgc+gycQy2ubvE8Wdqc9kl4bjg7iNx7tW8fJoDSDd0zMYlwPygDkpQ3SSGfQ6WGfVF/4Ju8/hnJ43dp8Q5HyeAwIfE09oEduwwR1chnHBOb9p9fQZfDWgqXKdG2ddRbmJeKBq3quQuyLpko0HHazU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759300470; c=relaxed/simple; bh=Uw7x8ZcUkVRLUyMASOXDiODXajwhU4rdStY8cCiz5Gs=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=tq/qqspJmNAyOwCs5+4+qoF0IaaubRSL5gJOOJ2X3xw3988C3FxOSUUHI6Wri7O1mZQRewUxZgCrE7m6mQ7dOnRKTZ5isOEspCpuPaUBJh9j8Ni+nPs4Fp07GbNwXewagcz1mA5YvKprr8VCkkUIb+GeWyjegNW0IgDOUni8X1Y= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=QNi3lYVy; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="QNi3lYVy" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1759300467; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=woBCeinsl2yhDRlycgTtLA3M5vo8Y9Z+g5BKloQJavA=; b=QNi3lYVyxd6SRFLVZ8u15J9od+09jIsBP/3lGNaRJW5Vcj5ksiZOZ0qOMugc/3CVdVf0gy VLCD1CozMhxnsjmu4smxaxMZMXuxwU5lqeIvnLEEKq/elVHONGf/OpbG4CcF7lgMOn24G2 6jTuHUHB7Xl2FeJzpwo064YTz6glucs= Received: from mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-321-an3NFwN6PFODZu5Q0_s1Mg-1; Wed, 01 Oct 2025 02:34:23 -0400 X-MC-Unique: an3NFwN6PFODZu5Q0_s1Mg-1 X-Mimecast-MFC-AGG-ID: an3NFwN6PFODZu5Q0_s1Mg_1759300462 Received: from mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.12]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 7F5D51800366; Wed, 1 Oct 2025 06:34:21 +0000 (UTC) Received: from my-developer-toolbox-latest (unknown [10.44.32.240]) by mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with SMTP id D05EA19560B4; Wed, 1 Oct 2025 06:34:16 +0000 (UTC) Date: Tue, 30 Sep 2025 23:34:14 -0700 From: Chris Leech To: Dmitry Bogdanov Cc: Keith Busch , Jens Axboe , Christoph Hellwig , Sagi Grimberg , Stuart Hayes , linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org, linux@yadro.com, stable@vger.kernel.org Subject: [PATCH] nvme-tcp: switch to per-cpu page_frag_cache Message-ID: <20250930-tableware-untaxed-6a68b2e1e970@redhat.com> References: <20250929111951.6961-1-d.bogdanov@yadro.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20250929111951.6961-1-d.bogdanov@yadro.com> X-Scanned-By: MIMEDefang 3.0 on 10.30.177.12 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" nvme-tcp uses page_frag_cache to preallocate PDU for each preallocated request of block device. Block devices are created in parallel threads, consequently page_frag_cache is used in not thread-safe manner. That leads to incorrect refcounting of backstore pages and premature free. That can be catched by !sendpage_ok inside network stack: WARNING: CPU: 7 PID: 467 at ../net/core/skbuff.c:6931 skb_splice_from_iter+= 0xfa/0x310. tcp_sendmsg_locked+0x782/0xce0 tcp_sendmsg+0x27/0x40 sock_sendmsg+0x8b/0xa0 nvme_tcp_try_send_cmd_pdu+0x149/0x2a0 Then random panic may occur. Fix that by switching from having a per-queue page_frag_cache to a per-cpu page_frag_cache. Cc: stable@vger.kernel.org # 6.12 Fixes: 4e893ca81170 ("nvme_core: scan namespaces asynchronously") Reported-by: Dmitry Bogdanov Signed-off-by: Chris Leech --- drivers/nvme/host/tcp.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c index 1413788ca7d52..a4c4ace5be0f4 100644 --- a/drivers/nvme/host/tcp.c +++ b/drivers/nvme/host/tcp.c @@ -174,7 +174,6 @@ struct nvme_tcp_queue { __le32 recv_ddgst; struct completion tls_complete; int tls_err; - struct page_frag_cache pf_cache; =20 void (*state_change)(struct sock *); void (*data_ready)(struct sock *); @@ -201,6 +200,7 @@ struct nvme_tcp_ctrl { =20 static LIST_HEAD(nvme_tcp_ctrl_list); static DEFINE_MUTEX(nvme_tcp_ctrl_mutex); +static DEFINE_PER_CPU(struct page_frag_cache, pf_cache); static struct workqueue_struct *nvme_tcp_wq; static const struct blk_mq_ops nvme_tcp_mq_ops; static const struct blk_mq_ops nvme_tcp_admin_mq_ops; @@ -556,7 +556,7 @@ static int nvme_tcp_init_request(struct blk_mq_tag_set = *set, struct nvme_tcp_queue *queue =3D &ctrl->queues[queue_idx]; u8 hdgst =3D nvme_tcp_hdgst_len(queue); =20 - req->pdu =3D page_frag_alloc(&queue->pf_cache, + req->pdu =3D page_frag_alloc(this_cpu_ptr(&pf_cache), sizeof(struct nvme_tcp_cmd_pdu) + hdgst, GFP_KERNEL | __GFP_ZERO); if (!req->pdu) @@ -1420,7 +1420,7 @@ static int nvme_tcp_alloc_async_req(struct nvme_tcp_c= trl *ctrl) struct nvme_tcp_request *async =3D &ctrl->async_req; u8 hdgst =3D nvme_tcp_hdgst_len(queue); =20 - async->pdu =3D page_frag_alloc(&queue->pf_cache, + async->pdu =3D page_frag_alloc(this_cpu_ptr(&pf_cache), sizeof(struct nvme_tcp_cmd_pdu) + hdgst, GFP_KERNEL | __GFP_ZERO); if (!async->pdu) @@ -1439,7 +1439,7 @@ static void nvme_tcp_free_queue(struct nvme_ctrl *nct= rl, int qid) if (!test_and_clear_bit(NVME_TCP_Q_ALLOCATED, &queue->flags)) return; =20 - page_frag_cache_drain(&queue->pf_cache); + page_frag_cache_drain(this_cpu_ptr(&pf_cache)); =20 noreclaim_flag =3D memalloc_noreclaim_save(); /* ->sock will be released by fput() */ --=20 2.50.1