From nobody Tue Jun 16 04:59:01 2026 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6783B1922F5; Thu, 16 Apr 2026 13:18:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=67.231.153.30 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776345526; cv=none; b=qX+D0m0gsV1ODimNjJlB38En4FHX18/hKHdNBfl73jvW0Yk6qVuqfNQt12tg3DPERXokERQeJwmJ1qzuSF9Lu7wg0Otyda+zrn+MavyuKJ+8UpSzUn2Xh2VLqlMO9FE0IjnEwrBkVfj4PlUsenZzeGcK4TGShUqRdP5rnSeoQ0o= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776345526; c=relaxed/simple; bh=cZ0wF6XpZzjcVv0DmBfuPMS7YiVfE/wPlwBU1MEwAWQ=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=IV727wiA9CWT94eBVet2eXJuvHQ8OW5f1sxgBCzhkNEvuCY/ECzh3Qjdd3GlH6Q/9MiWpHYKALX1LOJfmWrkPjKUCf+2G6n3wa623LzcT6wRYV5QCXjTjCqafSnWsHecVaaSS6YaVBA3WNMvrdCZgcAMJ4hpO9hp/DNB6BTDXFA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com; spf=pass smtp.mailfrom=meta.com; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b=p7vUccaA; arc=none smtp.client-ip=67.231.153.30 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=meta.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b="p7vUccaA" Received: from pps.filterd (m0528004.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.18.1.11/8.18.1.11) with ESMTP id 63FLgjRH3451039; Thu, 16 Apr 2026 06:18:29 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=s2048-2025-q2; bh=gniy3uTDgAbZRbr2JvoSRDyzFG9cFS4I3MN/SyLSvY4=; b=p7vUccaANRfh uTeB35EDElC5PodjZQnqgRVrRBLLyEHsU8GW85yEenF7szOfkyrXbtuRG3Jkq13r JjSlyb/QuyfAQyKSwPcj8vh3c23QLtR3tp/dj1lNelftY1O+bHKpML8dYFbNqar/ Q7cK0zguBdtJGh/RLaW3GFP2gGGKtfvkMdNRyEeO8gZlKMcwTKkDnv1MAqq7MqZm T0FpEv6RpMdR7emzzXsmvxM0lbXeZoOiSX+1Rh0Ol5JpZ74Ph+1dAJtidNROBkOW V7VKHltLd59wyuHJ9q0lQgNo9TwG1EfTCfMbiPobkZ+ZAfJJpFtrT7AF00gWwNM3 JTYdpSPsow== Received: from maileast.thefacebook.com ([163.114.135.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 4dh84tt04h-2 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT); Thu, 16 Apr 2026 06:18:29 -0700 (PDT) Received: from localhost (2620:10d:c0a8:1b::30) by mail.thefacebook.com (2620:10d:c0a9:6f::237c) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.2562.37; Thu, 16 Apr 2026 13:18:28 +0000 From: Matt Evans To: Alex Williamson , Leon Romanovsky , Jason Gunthorpe , Alex Mastro , =?UTF-8?q?Christian=20K=C3=B6nig?= CC: Mahmoud Adam , David Matlack , =?UTF-8?q?Bj=C3=B6rn=20T=C3=B6pel?= , Sumit Semwal , Kevin Tian , Ankit Agrawal , Pranjal Shrivastava , Alistair Popple , Vivek Kasireddy , , , , , Subject: [PATCH 1/9] vfio/pci: Fix vfio_pci_dma_buf_cleanup() double-put Date: Thu, 16 Apr 2026 06:17:44 -0700 Message-ID: <20260416131815.2729131-2-mattev@meta.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260416131815.2729131-1-mattev@meta.com> References: <20260416131815.2729131-1-mattev@meta.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Authority-Analysis: v=2.4 cv=DeknbPtW c=1 sm=1 tr=0 ts=69e0e1a5 cx=c_pps a=MfjaFnPeirRr97d5FC5oHw==:117 a=MfjaFnPeirRr97d5FC5oHw==:17 a=A5OVakUREuEA:10 a=VkNPw1HP01LnGYTKEx00:22 a=7x6HtfJdh03M6CCDgxCd:22 a=GbPsI2Ihf5RTnMjR_gZv:22 a=VabnemYjAAAA:8 a=3E3Mp2m-XbClQYvposQA:9 a=gKebqoRLp9LExxC7YDUY:22 X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwNDE2MDEyNiBTYWx0ZWRfX6FrByzlxctLn wXqFi+OEBYpJxxVODVtu6SpBggSjDgAiAHLcltsTCM/zsc2q7fMcowKkqBvDg+F3mdxiJjRku4x moO+QPW9snLGiEsDNnJy3N4zXUvQ2tVR3cpwKAua53JLjzBoycprZupI/4rVZKfEhcXVUouMnNr HNgbDSgN5azCjjyCQJnRB5rYl9X0y8NiIRX0b3nEjUbmGFa0AR9XfM8UxcR8FlamtzUv0FvjYM9 eFLtMeY0dqRCzbm8yw6J91BscZDm6VBZq8fkoO8Xd/htuaf2ohoC1gG+ySaqyDnehcclWiF15oi Kc+upQye6m2ofs5zKo/O7qskgsnANL8dQUhs+RAzsF1KCLZKAXsxkGg523KhhJAYk+kvCQuw5lU YiVuU1oPCxsujxrnm3eIAF+VQgc3QX2O7kpDTJH6c8S2iuAkC5KQcvu2nUQ5WYjOzJLsHad0orA AMb08PIytGU/JZ7BJEw== X-Proofpoint-GUID: rldl6TLxiGrFUZss98IienREl0Moymje X-Proofpoint-ORIG-GUID: rldl6TLxiGrFUZss98IienREl0Moymje X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1143,Hydra:6.1.51,FMLib:17.12.100.49 definitions=2026-04-16_03,2026-04-16_02,2025-10-01_01 Content-Type: text/plain; charset="utf-8" vfio_pci_dma_buf_cleanup() assumed all VFIO device DMABUFs need to be revoked. However, if vfio_pci_dma_buf_move() revokes DMABUFs before the fd/device closes, then vfio_pci_dma_buf_cleanup() would do a second/underflowing kref_put() then wait_for_completion() on a completion that never fires. Fixed by predicating on revocation status. This could happen if PCI_COMMAND_MEMORY is cleared before closing the device fd (but the scenario is more likely to hit when future commits add more methods to revoke DMABUFs). Fixes: 1a8a5227f2299 ("vfio: Wait for dma-buf invalidation to complete") Signed-off-by: Matt Evans Reviewed-by: Jason Gunthorpe --- (Just a fix, but later "vfio/pci: Convert BAR mmap() to use a DMABUF" and "vfio/pci: Permanently revoke a DMABUF on request" depend on this context, so including in this series.) drivers/vfio/pci/vfio_pci_dmabuf.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/drivers/vfio/pci/vfio_pci_dmabuf.c b/drivers/vfio/pci/vfio_pci= _dmabuf.c index 281ba7d69567..04478b7415a0 100644 --- a/drivers/vfio/pci/vfio_pci_dmabuf.c +++ b/drivers/vfio/pci/vfio_pci_dmabuf.c @@ -395,20 +395,25 @@ void vfio_pci_dma_buf_cleanup(struct vfio_pci_core_de= vice *vdev) =20 down_write(&vdev->memory_lock); list_for_each_entry_safe(priv, tmp, &vdev->dmabufs, dmabufs_elm) { + bool was_revoked; + if (!get_file_active(&priv->dmabuf->file)) continue; =20 dma_resv_lock(priv->dmabuf->resv, NULL); list_del_init(&priv->dmabufs_elm); priv->vdev =3D NULL; + was_revoked =3D priv->revoked; priv->revoked =3D true; dma_buf_invalidate_mappings(priv->dmabuf); dma_resv_wait_timeout(priv->dmabuf->resv, DMA_RESV_USAGE_BOOKKEEP, false, MAX_SCHEDULE_TIMEOUT); dma_resv_unlock(priv->dmabuf->resv); - kref_put(&priv->kref, vfio_pci_dma_buf_done); - wait_for_completion(&priv->comp); + if (!was_revoked) { + kref_put(&priv->kref, vfio_pci_dma_buf_done); + wait_for_completion(&priv->comp); + } vfio_device_put_registration(&vdev->vdev); fput(priv->dmabuf->file); } --=20 2.47.3 From nobody Tue Jun 16 04:59:01 2026 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 191AF3BB9F8; Thu, 16 Apr 2026 13:18:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=67.231.153.30 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776345528; cv=none; b=frjQPX6dhONuWA2IETlvuPLbiCGfl/AxU01fNQfmFXc+cQfdRbOqpRh4haJ4Oe7G+59HWQLcLFzSaC6IoR9Bu7EOngxr2peQwmkKQz0Wtlj/YxiNTr2W9wSCZfDelb0rfY8a2hIBsNxWHC0I/TvWPPjIuTrgVIoCwiO/2k98C+U= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776345528; c=relaxed/simple; bh=OVyyLfEZRxUbMwDPYgMyZz73rpUFL0V3NT9sJS3e7kk=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=uqkjdQ3FmTuQfSGI3cXz2ZyMLru4J/a0c7MXcBxhCWMs4yIqIwY6k93CmH0QZfwTNPvwRS2YvXe+nSClk3f9ykc6yQjF9MwaikGu93YWIO+uNsmTdcwfoeJWTnCzBjtXsI6G9rAeHCF7i9WmiQL/+ykuuDdOQcp5CWFyEWGVnYI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com; spf=pass smtp.mailfrom=meta.com; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b=q9zn+5qL; arc=none smtp.client-ip=67.231.153.30 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=meta.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b="q9zn+5qL" Received: from pps.filterd (m0109331.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.18.1.11/8.18.1.11) with ESMTP id 63G51Pjg2882190; Thu, 16 Apr 2026 06:18:32 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=s2048-2025-q2; bh=eSmgUWsE5Z0tfgLMREnOpAUpUl+vKjlw9UTKmkFxfSc=; b=q9zn+5qLNa5E Q/ZRg9TfFs2bCP51SG1li/ZmMyfyAKYUZ8FlcS853FSv7VKH6Uh8hy2jCBAtl0nb ntFKB+NgqsnOcQB4kts8nzCzNIFOhtWaepYpDTo0GdfbaVFfPoZ6R+PlCN5z4ifg zSf8IBz6E3YlkxUQSSUmto/XmEXb6sFmDIitmDgS3Sf+oahyh8Obji4ZJ9VvMh3O jrvPaSPnEKx5ybAmnTAFlZS9XWqPFJQXbhrvjVz5MfVak3SDXAN9e8jg1KDjkVmV 96xWnLObLBHEZqxkbXX4EHy320ieE6iLlYQc54Lz1SvYPJK5wai3iz1EzFL3lvxL 03ZIiqT+hw== Received: from maileast.thefacebook.com ([163.114.135.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 4dh85d2013-2 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT); Thu, 16 Apr 2026 06:18:31 -0700 (PDT) Received: from localhost (2620:10d:c0a8:1c::1b) by mail.thefacebook.com (2620:10d:c0a9:6f::8fd4) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.2562.37; Thu, 16 Apr 2026 13:18:30 +0000 From: Matt Evans To: Alex Williamson , Leon Romanovsky , Jason Gunthorpe , Alex Mastro , =?UTF-8?q?Christian=20K=C3=B6nig?= CC: Mahmoud Adam , David Matlack , =?UTF-8?q?Bj=C3=B6rn=20T=C3=B6pel?= , Sumit Semwal , Kevin Tian , Ankit Agrawal , Pranjal Shrivastava , Alistair Popple , Vivek Kasireddy , , , , , Subject: [PATCH 2/9] vfio/pci: Add a helper to look up PFNs for DMABUFs Date: Thu, 16 Apr 2026 06:17:45 -0700 Message-ID: <20260416131815.2729131-3-mattev@meta.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260416131815.2729131-1-mattev@meta.com> References: <20260416131815.2729131-1-mattev@meta.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwNDE2MDEyNiBTYWx0ZWRfXw1lPbPcGkRox CqcJzA2hswqmmCNE0gWQ7ygPBylo8ll14hVE0aw3lTG82LgU9Fl9jEzarqldyEsQZM1/5SRomkU nTW8vgDGfi6y02yLBBGWMg0ewxGAtGNZB1MWQ4Yf/Q1+ISCdREMepNtChdvo1Xn+Cy/IfglNwju 49u2q6TXeO9bMXSFImOyMHZOpM0dHqWxg2Gw1rU1bMEiSEM1rDzdy1Rv3wtxmTIZQRHqjWrrzaM 6UzZDFzf7MPYsUkLeak02u3ry3Cm6i4sasZrhIYuEsUq6LI8HE1wwQwKwgNezwlgit+chA9lY0m 1AFTKFlgpURZ3DQH7gy9sXlnir37U6RP0uLquCk8X0zDUYBMJUpU20hO3lzF1wQ2qS3ab7wMLjV H8TQqgey2oWWywFMAC+3ptQCa6QILfAs+vKdWhqSIeC/Q6htjKiD/S7vWIVkSI835eYv/6s0R1D jO8ks8oGES8rhNo7ojQ== X-Proofpoint-ORIG-GUID: LqvuGlBZXxDoxGhZ1DAhPLPr-erZsXWI X-Authority-Analysis: v=2.4 cv=Xfm5Co55 c=1 sm=1 tr=0 ts=69e0e1a7 cx=c_pps a=MfjaFnPeirRr97d5FC5oHw==:117 a=MfjaFnPeirRr97d5FC5oHw==:17 a=A5OVakUREuEA:10 a=VkNPw1HP01LnGYTKEx00:22 a=7x6HtfJdh03M6CCDgxCd:22 a=wpfVPzegXHpEFt3DAXn9:22 a=VabnemYjAAAA:8 a=Yodn40Sd1KWp2EMRI-0A:9 a=gKebqoRLp9LExxC7YDUY:22 X-Proofpoint-GUID: LqvuGlBZXxDoxGhZ1DAhPLPr-erZsXWI X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1143,Hydra:6.1.51,FMLib:17.12.100.49 definitions=2026-04-16_03,2026-04-16_02,2025-10-01_01 Content-Type: text/plain; charset="utf-8" Add vfio_pci_dma_buf_find_pfn(), which a VMA fault handler can use to find a PFN. This supports multi-range DMABUFs, which typically would be used to represent scattered spans but might even represent overlapping or aliasing spans of PFNs. Because this is intended to be used in vfio_pci_core.c, we also need to expose the struct vfio_pci_dma_buf in the vfio_pci_priv.h header. Signed-off-by: Matt Evans --- drivers/vfio/pci/vfio_pci_dmabuf.c | 124 ++++++++++++++++++++++++++--- drivers/vfio/pci/vfio_pci_priv.h | 19 +++++ 2 files changed, 130 insertions(+), 13 deletions(-) diff --git a/drivers/vfio/pci/vfio_pci_dmabuf.c b/drivers/vfio/pci/vfio_pci= _dmabuf.c index 04478b7415a0..8b6bae56bbf2 100644 --- a/drivers/vfio/pci/vfio_pci_dmabuf.c +++ b/drivers/vfio/pci/vfio_pci_dmabuf.c @@ -9,19 +9,6 @@ =20 MODULE_IMPORT_NS("DMA_BUF"); =20 -struct vfio_pci_dma_buf { - struct dma_buf *dmabuf; - struct vfio_pci_core_device *vdev; - struct list_head dmabufs_elm; - size_t size; - struct phys_vec *phys_vec; - struct p2pdma_provider *provider; - u32 nr_ranges; - struct kref kref; - struct completion comp; - u8 revoked : 1; -}; - static int vfio_pci_dma_buf_attach(struct dma_buf *dmabuf, struct dma_buf_attachment *attachment) { @@ -106,6 +93,117 @@ static const struct dma_buf_ops vfio_pci_dmabuf_ops = =3D { .release =3D vfio_pci_dma_buf_release, }; =20 +int vfio_pci_dma_buf_find_pfn(struct vfio_pci_dma_buf *vpdmabuf, + struct vm_area_struct *vma, + unsigned long address, + unsigned int order, + unsigned long *out_pfn) +{ + /* + * Given a VMA (start, end, pgoffs) and a fault address, + * search the corresponding DMABUF's phys_vec[] to find the + * range representing the address's offset into the VMA, and + * its PFN. + * + * The phys_vec[] ranges represent contiguous spans of VAs + * upwards from the buffer offset 0; the actual PFNs might be + * in any order, overlap/alias, etc. Calculate an offset of + * the desired page given VMA start/pgoff and address, then + * search upwards from 0 to find which span contains it. + * + * On success, a valid PFN for a page sized by 'order' is + * returned into out_pfn. + * + * Failure occurs if: + * - The page would cross the edge of the VMA + * - The page isn't entirely contained within a range + * - We find a range, but the final PFN isn't aligned to the + * requested order. + * + * (Upon failure, the caller is expected to try again with a + * smaller order; the tests above will always succeed for + * order=3D0 as the limit case.) + * + * It's suboptimal if DMABUFs are created with neigbouring + * ranges that are physically contiguous, since hugepages + * can't straddle range boundaries. (The construction of the + * ranges vector should merge such ranges.) + */ + + const unsigned long pagesize =3D PAGE_SIZE << order; + unsigned long rounded_page_addr =3D address & ~(pagesize - 1); + unsigned long rounded_page_end =3D rounded_page_addr + pagesize; + unsigned long buf_page_offset; + unsigned long buf_offset =3D 0; + unsigned int i; + + if (rounded_page_addr < vma->vm_start || rounded_page_end > vma->vm_end) { + if (order > 0) + return -EAGAIN; + + /* A fault address outside of the VMA is absurd. */ + WARN(1, "Fault addr 0x%lx outside VMA 0x%lx-0x%lx\n", + address, vma->vm_start, vma->vm_end); + return -EFAULT; + } + + if (unlikely(check_add_overflow(rounded_page_addr - vma->vm_start, + vma->vm_pgoff << PAGE_SHIFT, &buf_page_offset))) + return -EFAULT; + + for (i =3D 0; i < vpdmabuf->nr_ranges; i++) { + size_t range_len =3D vpdmabuf->phys_vec[i].len; + phys_addr_t range_start =3D vpdmabuf->phys_vec[i].paddr; + + /* + * If the current range starts after the page's span, + * this and any future range won't match. Bail early. + */ + if (buf_page_offset + pagesize <=3D buf_offset) + break; + + if (buf_page_offset >=3D buf_offset && + buf_page_offset + pagesize <=3D buf_offset + range_len) { + /* + * The faulting page is wholly contained + * within the span represented by the range. + * Validate PFN alignment for the order: + */ + unsigned long pfn =3D (range_start >> PAGE_SHIFT) + + ((buf_page_offset - buf_offset) >> PAGE_SHIFT); + + if (IS_ALIGNED(pfn, 1 << order)) { + *out_pfn =3D pfn; + return 0; + } + /* Retry with smaller order */ + return -EAGAIN; + } + buf_offset +=3D range_len; + } + + /* + * A hugepage straddling a range boundary will fail to match a + * range, but the address will (eventually) match when retried + * with a smaller page. + */ + if (order > 0) + return -EAGAIN; + + /* + * If we get here, the address fell outside of the span + * represented by the (concatenated) ranges. Setup of a + * mapping must ensure that the VMA is <=3D the total size of + * the ranges, so this should never happen. But, if it does, + * force SIGBUS for the access and warn. + */ + WARN_ONCE(1, "No range for addr 0x%lx, order %d: VMA 0x%lx-0x%lx pgoff 0x= %lx, %u ranges, size 0x%zx\n", + address, order, vma->vm_start, vma->vm_end, vma->vm_pgoff, + vpdmabuf->nr_ranges, vpdmabuf->size); + + return -EFAULT; +} + /* * This is a temporary "private interconnect" between VFIO DMABUF and iomm= ufd. * It allows the two co-operating drivers to exchange the physical address= of diff --git a/drivers/vfio/pci/vfio_pci_priv.h b/drivers/vfio/pci/vfio_pci_p= riv.h index fca9d0dfac90..317170a5b407 100644 --- a/drivers/vfio/pci/vfio_pci_priv.h +++ b/drivers/vfio/pci/vfio_pci_priv.h @@ -23,6 +23,19 @@ struct vfio_pci_ioeventfd { bool test_mem; }; =20 +struct vfio_pci_dma_buf { + struct dma_buf *dmabuf; + struct vfio_pci_core_device *vdev; + struct list_head dmabufs_elm; + size_t size; + struct phys_vec *phys_vec; + struct p2pdma_provider *provider; + u32 nr_ranges; + struct kref kref; + struct completion comp; + u8 revoked : 1; +}; + bool vfio_pci_intx_mask(struct vfio_pci_core_device *vdev); void vfio_pci_intx_unmask(struct vfio_pci_core_device *vdev); =20 @@ -114,6 +127,12 @@ static inline bool vfio_pci_is_vga(struct pci_dev *pde= v) return (pdev->class >> 8) =3D=3D PCI_CLASS_DISPLAY_VGA; } =20 +int vfio_pci_dma_buf_find_pfn(struct vfio_pci_dma_buf *vpdmabuf, + struct vm_area_struct *vma, + unsigned long address, + unsigned int order, + unsigned long *out_pfn); + #ifdef CONFIG_VFIO_PCI_DMABUF int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 f= lags, struct vfio_device_feature_dma_buf __user *arg, --=20 2.47.3 From nobody Tue Jun 16 04:59:01 2026 Received: from mx0a-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0C91339E185; Thu, 16 Apr 2026 13:18:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=67.231.153.30 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776345528; cv=none; b=S06E5L+FX3pZ3StITi0BbBVkpNz626NfHfCzhGZdgfJOefwrYlCwTVSZjN4vABRIhtplPdu1p05Q6XqZM/DFbo965SAAWWFuDSyfoIwDaVNQgANo5wk4xO/wfi09R2/7+EKAr54LBuzLwghh7LmiAtIdv+IYCmjd788FWJI/tjw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776345528; c=relaxed/simple; bh=N+eufCz4yUFyEIIENQYARJE2vkXNc234Yf3UvKGxLxk=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=LkqpOImDZmT7pOBobGJBrHV3CsFa+caTJz/BWP11jwqZ+1nVzNbyP1PaE5rZBK4fpFI62md19MqH9dFPReYGR1YmI9MZJJ4DrRV6N6yr/Hbjhoa3uwm0zCsA3m7QSemMF+Mm66CgeCoeAJ9Kac5yRUZF30TrWXHmH7kMsnQrFTo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com; spf=pass smtp.mailfrom=meta.com; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b=DMd8tAhp; arc=none smtp.client-ip=67.231.153.30 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=meta.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b="DMd8tAhp" Received: from pps.filterd (m0001303.ppops.net [127.0.0.1]) by m0001303.ppops.net (8.18.1.11/8.18.1.11) with ESMTP id 63G8D34W2796640; Thu, 16 Apr 2026 06:18:35 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=s2048-2025-q2; bh=EfdyYjdxTLYHB3RCNsRMczA2dYHIQJkje+Q84vXd7oE=; b=DMd8tAhpLd4k iFTCkOWx9ebIOLwJNgIRiPbar6L/XuCNcDbMRCr+wOIXx1jpu4Ays+A2N+iv1JtG kFpTlEKtS9gIH/H/qA8WX/Q6qoJ7zaFS3ACkhih6wdod1Lz4xGehMD0YErf4CjvA Z4RNmVcMpbhTkOevFbAnCKl1m0asH3C1ywSZAEmS48WtknWQGU/9JDApEqCSLS9S OOG3HMRYZe0gA27uwzLreqF17I0AgSZ3ij6btuVH/BeQzoyYajcLcDh614XB6rKI GF0b13q72hdaExAX+ktYlQjiLrkk8Umd129WkM1KeK+T0vdeIqT8lIKEkA6Xtfm5 uLHhmIBkDg== Received: from mail.thefacebook.com ([163.114.134.16]) by m0001303.ppops.net (PPS) with ESMTPS id 4dh84r1wkd-2 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT); Thu, 16 Apr 2026 06:18:35 -0700 (PDT) Received: from localhost (2620:10d:c085:208::7cb7) by mail.thefacebook.com (2620:10d:c08b:78::2ac9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.2562.37; Thu, 16 Apr 2026 13:18:33 +0000 From: Matt Evans To: Alex Williamson , Leon Romanovsky , Jason Gunthorpe , Alex Mastro , =?UTF-8?q?Christian=20K=C3=B6nig?= CC: Mahmoud Adam , David Matlack , =?UTF-8?q?Bj=C3=B6rn=20T=C3=B6pel?= , Sumit Semwal , Kevin Tian , Ankit Agrawal , Pranjal Shrivastava , Alistair Popple , Vivek Kasireddy , , , , , Subject: [PATCH 3/9] vfio/pci: Add a helper to create a DMABUF for a BAR-map VMA Date: Thu, 16 Apr 2026 06:17:46 -0700 Message-ID: <20260416131815.2729131-4-mattev@meta.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260416131815.2729131-1-mattev@meta.com> References: <20260416131815.2729131-1-mattev@meta.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwNDE2MDEyNiBTYWx0ZWRfX6Y9qt80N7mLm GYL5ULqAVXiCAQsgEuMpTTZzBdMJbMFP8x4DI9DBMnv8ykJP0CPCmzNTsVQuw+l6Bt03LMhmTiw HKfiy+ug9VrhGLIUhNnXsc0oyL8jATzDAhE71WYToLXcWlqKvAeN4Djuc6ieEuPflEqkxaqQ6Rq rzsMSmhZcHoFc6nFca54jV5JBgY+1lu6+KZdkB9EOU64C+5e83Kj1yOeLV6XIcBeb/zNA6jDpQw k2FBTTlG/xBzdCN2uJL3c1A3U7F+M/P6kc6YVS5XpeowTYHwgzFVBl1FAxNf0i7q60AMKKcz1Xr 2KlswnNdF8VmRY9TD+mukdJxFby5ADt652YaXzhBjJ/wtpjm3S8qX9CKtkDv+/iIhiR806rKE+8 THybE9FB9Fg+9zWal+/lwQt9EPzRz9E+UgvAkMUDH9hnWkI+nUuiojOwSeU4YuLpYQKYCvi3i+Q M3hfOhNjktNRFhmDq0w== X-Authority-Analysis: v=2.4 cv=Ceg4Irrl c=1 sm=1 tr=0 ts=69e0e1ab cx=c_pps a=CB4LiSf2rd0gKozIdrpkBw==:117 a=CB4LiSf2rd0gKozIdrpkBw==:17 a=Dv35txUGz5gI0hTa:21 a=A5OVakUREuEA:10 a=VkNPw1HP01LnGYTKEx00:22 a=7x6HtfJdh03M6CCDgxCd:22 a=_78whYxrdx1mplLwxq1U:22 a=VabnemYjAAAA:8 a=c_N19DIyldVR5FlEXhEA:9 a=gKebqoRLp9LExxC7YDUY:22 X-Proofpoint-GUID: dCL44EaIcBe97jCd_xfyBQjfLULU7LHN X-Proofpoint-ORIG-GUID: dCL44EaIcBe97jCd_xfyBQjfLULU7LHN X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1143,Hydra:6.1.51,FMLib:17.12.100.49 definitions=2026-04-16_03,2026-04-16_02,2025-10-01_01 Content-Type: text/plain; charset="utf-8" This helper, vfio_pci_core_mmap_prep_dmabuf(), creates a single-range DMABUF for the purpose of mapping a PCI BAR. This is used in a future commit by VFIO's ordinary mmap() path. This function transfers ownership of the VFIO device fd to the DMABUF, which fput()s when it's released. Refactor the existing vfio_pci_core_feature_dma_buf() to split out export code common to the two paths, VFIO_DEVICE_FEATURE_DMA_BUF and this new VFIO_BAR mmap(). Signed-off-by: Matt Evans --- drivers/vfio/pci/vfio_pci_dmabuf.c | 143 +++++++++++++++++++++++------ drivers/vfio/pci/vfio_pci_priv.h | 5 + 2 files changed, 118 insertions(+), 30 deletions(-) diff --git a/drivers/vfio/pci/vfio_pci_dmabuf.c b/drivers/vfio/pci/vfio_pci= _dmabuf.c index 8b6bae56bbf2..3554afbc8ebc 100644 --- a/drivers/vfio/pci/vfio_pci_dmabuf.c +++ b/drivers/vfio/pci/vfio_pci_dmabuf.c @@ -82,6 +82,8 @@ static void vfio_pci_dma_buf_release(struct dma_buf *dmab= uf) up_write(&priv->vdev->memory_lock); vfio_device_put_registration(&priv->vdev->vdev); } + if (priv->vfile) + fput(priv->vfile); kfree(priv->phys_vec); kfree(priv); } @@ -204,6 +206,45 @@ int vfio_pci_dma_buf_find_pfn(struct vfio_pci_dma_buf = *vpdmabuf, return -EFAULT; } =20 +/* + * Create a DMABUF corresponding to priv, add it to vdev->dmabufs list + * for tracking (meaning cleanup or revocation will zap it), and take + * a vfio_device registration. + */ +static int vfio_pci_dmabuf_export(struct vfio_pci_core_device *vdev, + struct vfio_pci_dma_buf *priv, uint32_t flags) +{ + DEFINE_DMA_BUF_EXPORT_INFO(exp_info); + + if (!vfio_device_try_get_registration(&vdev->vdev)) + return -ENODEV; + + exp_info.ops =3D &vfio_pci_dmabuf_ops; + exp_info.size =3D priv->size; + exp_info.flags =3D flags; + exp_info.priv =3D priv; + + priv->dmabuf =3D dma_buf_export(&exp_info); + if (IS_ERR(priv->dmabuf)) { + vfio_device_put_registration(&vdev->vdev); + return PTR_ERR(priv->dmabuf); + } + + kref_init(&priv->kref); + init_completion(&priv->comp); + + /* dma_buf_put() now frees priv */ + INIT_LIST_HEAD(&priv->dmabufs_elm); + down_write(&vdev->memory_lock); + dma_resv_lock(priv->dmabuf->resv, NULL); + priv->revoked =3D !__vfio_pci_memory_enabled(vdev); + list_add_tail(&priv->dmabufs_elm, &vdev->dmabufs); + dma_resv_unlock(priv->dmabuf->resv); + up_write(&vdev->memory_lock); + + return 0; +} + /* * This is a temporary "private interconnect" between VFIO DMABUF and iomm= ufd. * It allows the two co-operating drivers to exchange the physical address= of @@ -322,7 +363,6 @@ int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_= device *vdev, u32 flags, { struct vfio_device_feature_dma_buf get_dma_buf =3D {}; struct vfio_region_dma_range *dma_ranges; - DEFINE_DMA_BUF_EXPORT_INFO(exp_info); struct vfio_pci_dma_buf *priv; size_t length; int ret; @@ -392,34 +432,9 @@ int vfio_pci_core_feature_dma_buf(struct vfio_pci_core= _device *vdev, u32 flags, kfree(dma_ranges); dma_ranges =3D NULL; =20 - if (!vfio_device_try_get_registration(&vdev->vdev)) { - ret =3D -ENODEV; + ret =3D vfio_pci_dmabuf_export(vdev, priv, get_dma_buf.open_flags); + if (ret) goto err_free_phys; - } - - exp_info.ops =3D &vfio_pci_dmabuf_ops; - exp_info.size =3D priv->size; - exp_info.flags =3D get_dma_buf.open_flags; - exp_info.priv =3D priv; - - priv->dmabuf =3D dma_buf_export(&exp_info); - if (IS_ERR(priv->dmabuf)) { - ret =3D PTR_ERR(priv->dmabuf); - goto err_dev_put; - } - - kref_init(&priv->kref); - init_completion(&priv->comp); - - /* dma_buf_put() now frees priv */ - INIT_LIST_HEAD(&priv->dmabufs_elm); - down_write(&vdev->memory_lock); - dma_resv_lock(priv->dmabuf->resv, NULL); - priv->revoked =3D !__vfio_pci_memory_enabled(vdev); - list_add_tail(&priv->dmabufs_elm, &vdev->dmabufs); - dma_resv_unlock(priv->dmabuf->resv); - up_write(&vdev->memory_lock); - /* * dma_buf_fd() consumes the reference, when the file closes the dmabuf * will be released. @@ -430,8 +445,6 @@ int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_= device *vdev, u32 flags, =20 return ret; =20 -err_dev_put: - vfio_device_put_registration(&vdev->vdev); err_free_phys: kfree(priv->phys_vec); err_free_priv: @@ -441,6 +454,76 @@ int vfio_pci_core_feature_dma_buf(struct vfio_pci_core= _device *vdev, u32 flags, return ret; } =20 +int vfio_pci_core_mmap_prep_dmabuf(struct vfio_pci_core_device *vdev, + struct vm_area_struct *vma, + u64 phys_start, u64 req_len, + unsigned int res_index) +{ + struct vfio_pci_dma_buf *priv; + const unsigned int nr_ranges =3D 1; + int ret; + + priv =3D kzalloc_obj(*priv); + if (!priv) + return -ENOMEM; + + priv->phys_vec =3D kzalloc_obj(*priv->phys_vec); + if (!priv->phys_vec) { + ret =3D -ENOMEM; + goto err_free_priv; + } + + /* + * The mmap() request's vma->vm_offs might be non-zero, but + * the DMABUF is created from _offset zero_ of the BAR. The + * portion between zero and the vm_offs is inaccessible + * through this VMA, but this approach keeps the + * /proc//maps offset somewhat consistent with the + * pre-DMABUF code. Size includes the offset portion. + * + * This differs from an mmap() of an explicitly-exported + * DMABUF which is an arbitrary slice of the BAR, would be + * created with the desired offset+size, and would usually be + * mmap()ed with pgoff =3D 0. + * + * Both are equivalent and vfio_pci_dma_buf_find_pfn() finds + * the same PFNs. + */ + priv->vdev =3D vdev; + priv->nr_ranges =3D nr_ranges; + priv->size =3D (vma->vm_pgoff << PAGE_SHIFT) + req_len; + priv->provider =3D pcim_p2pdma_provider(vdev->pdev, res_index); + if (!priv->provider) { + ret =3D -EINVAL; + goto err_free_phys; + } + + priv->phys_vec[0].paddr =3D phys_start; + priv->phys_vec[0].len =3D priv->size; + + ret =3D vfio_pci_dmabuf_export(vdev, priv, O_CLOEXEC | O_RDWR); + if (ret) + goto err_free_phys; + + /* + * The VMA gets the DMABUF file so that other users can locate + * the DMABUF via a VA. Ownership of the original VFIO device + * file being mmap()ed transfers to priv, and is put when the + * DMABUF is released. + */ + priv->vfile =3D vma->vm_file; + vma->vm_file =3D priv->dmabuf->file; + vma->vm_private_data =3D priv; + + return 0; + +err_free_phys: + kfree(priv->phys_vec); +err_free_priv: + kfree(priv); + return ret; +} + void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev, bool revoked) { struct vfio_pci_dma_buf *priv; diff --git a/drivers/vfio/pci/vfio_pci_priv.h b/drivers/vfio/pci/vfio_pci_p= riv.h index 317170a5b407..3cff1b7eb47b 100644 --- a/drivers/vfio/pci/vfio_pci_priv.h +++ b/drivers/vfio/pci/vfio_pci_priv.h @@ -30,6 +30,7 @@ struct vfio_pci_dma_buf { size_t size; struct phys_vec *phys_vec; struct p2pdma_provider *provider; + struct file *vfile; u32 nr_ranges; struct kref kref; struct completion comp; @@ -132,6 +133,10 @@ int vfio_pci_dma_buf_find_pfn(struct vfio_pci_dma_buf = *vpdmabuf, unsigned long address, unsigned int order, unsigned long *out_pfn); +int vfio_pci_core_mmap_prep_dmabuf(struct vfio_pci_core_device *vdev, + struct vm_area_struct *vma, + u64 phys_start, u64 req_len, + unsigned int res_index); =20 #ifdef CONFIG_VFIO_PCI_DMABUF int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 f= lags, --=20 2.47.3 From nobody Tue Jun 16 04:59:01 2026 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9C5D03B95F8; Thu, 16 Apr 2026 13:18:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=67.231.145.42 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776345540; cv=none; b=OgrvInYWGuGoMyaxMzeGzilc62GDzpuPkizDccTvakZuGoBqDp2CbJxUGZRn624Uat86UK5UNJNml06A0GDbEkkJZ590x3qe1crFhn/FTt6h8nLe3nFwTFOBwnPq989plDOb3Ac5McmnZ205V91ctzgtdohuK21mCsRSZNz8hiU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776345540; c=relaxed/simple; bh=ZLFYviURK0Jo5iJa3jn+hbjPLtjzafodUV7KOzOUSq4=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=QCgZOXLIEVkae0pkIeEP5dKg9cJJ4bDhday7kP4l4t8L4y3GrxwVmml4mp73YjjSazV/ytq7sjEecol2DRljjtF2S8+Ixp5XWbYVt7/6t9MZXqBtdIYurGw/pVU41ypaosbyHRgBriRMkvnT/6NlThv5yVFHn58at+Igehsbqdg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com; spf=pass smtp.mailfrom=meta.com; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b=PbmtBS8U; arc=none smtp.client-ip=67.231.145.42 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=meta.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b="PbmtBS8U" Received: from pps.filterd (m0109333.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.18.1.11/8.18.1.11) with ESMTP id 63G0Fkku3059899; Thu, 16 Apr 2026 06:18:43 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=s2048-2025-q2; bh=xcqwl6TFNi9lbH5ZbEfmbrASxld9aPm7XILdLR5hDuE=; b=PbmtBS8UVhGy 7/ikrQLR1jDE8fdoVXr/Dn78NV5SXkwEVI35WOop8savYDliIr9MAUqru5AJNndg pgBnvyBtMONl4gchK4PbYtbBrGWBFsCdHvMAD1MdMAj7cqhdb1cZQqybY+DBRd0j q9lRVEUSSviDD3Y3sKQRfnQgNZ2tSOoI9MnlCtT1243yXHZtnPICG9sGCo2NbI7A 8OVZl7rCipzeX4UZAMenERYzWijk6TrtRG90pLbGCkn5X97I9X4pvD14N+BFXWjU L0k5jPYGCFLKzAEzG0vEMHWpgXhKNOLCLsHqf2ds2OxHZDY9Uf6/O9EanZZ3BPrq 8zeBAoVOXA== Received: from maileast.thefacebook.com ([163.114.135.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 4dh84x1sr8-2 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT); Thu, 16 Apr 2026 06:18:43 -0700 (PDT) Received: from localhost (2620:10d:c0a8:1b::8e35) by mail.thefacebook.com (2620:10d:c0a9:6f::237c) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.2562.37; Thu, 16 Apr 2026 13:18:41 +0000 From: Matt Evans To: Alex Williamson , Leon Romanovsky , Jason Gunthorpe , Alex Mastro , =?UTF-8?q?Christian=20K=C3=B6nig?= CC: Mahmoud Adam , David Matlack , =?UTF-8?q?Bj=C3=B6rn=20T=C3=B6pel?= , Sumit Semwal , Kevin Tian , Ankit Agrawal , Pranjal Shrivastava , Alistair Popple , Vivek Kasireddy , , , , , Subject: [PATCH 4/9] vfio/pci: Convert BAR mmap() to use a DMABUF Date: Thu, 16 Apr 2026 06:17:47 -0700 Message-ID: <20260416131815.2729131-5-mattev@meta.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260416131815.2729131-1-mattev@meta.com> References: <20260416131815.2729131-1-mattev@meta.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Proofpoint-GUID: 5wG2NqFMX-CSQTPQEWwcr7atughMWOTQ X-Proofpoint-ORIG-GUID: 5wG2NqFMX-CSQTPQEWwcr7atughMWOTQ X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwNDE2MDEyNyBTYWx0ZWRfX8eTe3XWZIs1V 4OveRSM3MIECkcsD80wWvzRc1bmIHRlwK4J84yVihsm5heBwuJc/hQkLGLGPWlrlyve5RyN13FU uRGK2ge2H9DsPHljVseFQTNJt7mdUUXJFUeM5tPDQh79lqLzNxHPJv1VxkSR7wf06wKQ7VoG0tT bryDGV8kVi/UKAL5DDVNNZQCgln1+RNLGSE9NSIn0mvSsrcvImgO6IVTWdtOaOUG+SJ+blfzbtZ ugzSfg0+YIXYlsOHwGpAH+4m5uIUL3O7M5ZpQAPpTEbPVGgPiCs3JeofQInwkUi2hENpRaFMTZL kIgVTR6kQJCMdSo0SMFA79VFErhv6J++ItVrpN2Bc5TbLyPPiPvVLHlpAGwHQfd0NtuspbyzlfB A2U4zyIJgZ96O0NeBhM3gS6or7eToQdwNxSdpQaCda+nL7G5i/Tw6Zf9RCFA/ZLNPW+eVxFfD6A ROWkeoHyj5iOXcYss0A== X-Authority-Analysis: v=2.4 cv=aepRWxot c=1 sm=1 tr=0 ts=69e0e1b3 cx=c_pps a=MfjaFnPeirRr97d5FC5oHw==:117 a=MfjaFnPeirRr97d5FC5oHw==:17 a=A5OVakUREuEA:10 a=VkNPw1HP01LnGYTKEx00:22 a=7x6HtfJdh03M6CCDgxCd:22 a=tpM8CJlwf7uhpglF1g9U:22 a=VabnemYjAAAA:8 a=kVk1raE1mBRInjEOdVoA:9 a=gKebqoRLp9LExxC7YDUY:22 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1143,Hydra:6.1.51,FMLib:17.12.100.49 definitions=2026-04-16_03,2026-04-16_02,2025-10-01_01 Content-Type: text/plain; charset="utf-8" Convert the VFIO device fd fops->mmap to create a DMABUF representing the BAR mapping, and make the VMA fault handler look up PFNs from the corresponding DMABUF. This supports future code mmap()ing BAR DMABUFs, and iommufd work to support Type1 P2P. First, vfio_pci_core_mmap() uses the new vfio_pci_core_mmap_prep_dmabuf() helper to export a DMABUF representing a single BAR range. Then, the vfio_pci_mmap_huge_fault() callback is updated to understand revoked buffers, and uses the new vfio_pci_dma_buf_find_pfn() helper to determine the PFN for a given fault address. Now that the VFIO DMABUFs can be mmap()ed, vfio_pci_dma_buf_move() and vfio_pci_dma_buf_cleanup() need to zap PTEs on revocation and cleanup paths. CONFIG_VFIO_PCI_CORE now unconditionally depends on CONFIG_DMA_SHARED_BUFFER. CONFIG_VFIO_PCI_DMABUF remains, to conditionally include support for VFIO_DEVICE_FEATURE_DMA_BUF, and depends on CONFIG_PCI_P2PDMA. Signed-off-by: Matt Evans --- drivers/vfio/pci/Kconfig | 3 +- drivers/vfio/pci/Makefile | 3 +- drivers/vfio/pci/vfio_pci_core.c | 86 ++++++++++++++++++------------ drivers/vfio/pci/vfio_pci_dmabuf.c | 14 +++++ drivers/vfio/pci/vfio_pci_priv.h | 11 +--- 5 files changed, 71 insertions(+), 46 deletions(-) diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig index 296bf01e185e..2074f2a941e1 100644 --- a/drivers/vfio/pci/Kconfig +++ b/drivers/vfio/pci/Kconfig @@ -6,6 +6,7 @@ config VFIO_PCI_CORE tristate select VFIO_VIRQFD select IRQ_BYPASS_MANAGER + select DMA_SHARED_BUFFER =20 config VFIO_PCI_INTX def_bool y if !S390 @@ -56,7 +57,7 @@ config VFIO_PCI_ZDEV_KVM To enable s390x KVM vfio-pci extensions, say Y. =20 config VFIO_PCI_DMABUF - def_bool y if VFIO_PCI_CORE && PCI_P2PDMA && DMA_SHARED_BUFFER + def_bool y if PCI_P2PDMA =20 source "drivers/vfio/pci/mlx5/Kconfig" =20 diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile index 6138f1bf241d..881452ea89be 100644 --- a/drivers/vfio/pci/Makefile +++ b/drivers/vfio/pci/Makefile @@ -1,8 +1,7 @@ # SPDX-License-Identifier: GPL-2.0-only =20 -vfio-pci-core-y :=3D vfio_pci_core.o vfio_pci_intrs.o vfio_pci_rdwr.o vfio= _pci_config.o +vfio-pci-core-y :=3D vfio_pci_core.o vfio_pci_intrs.o vfio_pci_rdwr.o vfio= _pci_config.o vfio_pci_dmabuf.o vfio-pci-core-$(CONFIG_VFIO_PCI_ZDEV_KVM) +=3D vfio_pci_zdev.o -vfio-pci-core-$(CONFIG_VFIO_PCI_DMABUF) +=3D vfio_pci_dmabuf.o obj-$(CONFIG_VFIO_PCI_CORE) +=3D vfio-pci-core.o =20 vfio-pci-y :=3D vfio_pci.o diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_c= ore.c index 4e9091e5fcc2..c00a61d61250 100644 --- a/drivers/vfio/pci/vfio_pci_core.c +++ b/drivers/vfio/pci/vfio_pci_core.c @@ -1648,18 +1648,6 @@ void vfio_pci_memory_unlock_and_restore(struct vfio_= pci_core_device *vdev, u16 c up_write(&vdev->memory_lock); } =20 -static unsigned long vma_to_pfn(struct vm_area_struct *vma) -{ - struct vfio_pci_core_device *vdev =3D vma->vm_private_data; - int index =3D vma->vm_pgoff >> (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT); - u64 pgoff; - - pgoff =3D vma->vm_pgoff & - ((1U << (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT)) - 1); - - return (pci_resource_start(vdev->pdev, index) >> PAGE_SHIFT) + pgoff; -} - vm_fault_t vfio_pci_vmf_insert_pfn(struct vfio_pci_core_device *vdev, struct vm_fault *vmf, unsigned long pfn, @@ -1687,23 +1675,42 @@ static vm_fault_t vfio_pci_mmap_huge_fault(struct v= m_fault *vmf, unsigned int order) { struct vm_area_struct *vma =3D vmf->vma; - struct vfio_pci_core_device *vdev =3D vma->vm_private_data; - unsigned long addr =3D vmf->address & ~((PAGE_SIZE << order) - 1); - unsigned long pgoff =3D (addr - vma->vm_start) >> PAGE_SHIFT; - unsigned long pfn =3D vma_to_pfn(vma) + pgoff; - vm_fault_t ret =3D VM_FAULT_FALLBACK; - - if (is_aligned_for_order(vma, addr, pfn, order)) { - scoped_guard(rwsem_read, &vdev->memory_lock) - ret =3D vfio_pci_vmf_insert_pfn(vdev, vmf, pfn, order); - } + struct vfio_pci_dma_buf *priv =3D vma->vm_private_data; + struct vfio_pci_core_device *vdev; + unsigned long pfn =3D 0; + vm_fault_t ret =3D VM_FAULT_SIGBUS; + + /* + * We can rely on the existence of both a DMABUF (priv) and + * the VFIO device it was exported from (vdev). This fault's + * VMA was established using vfio_pci_core_mmap_prep_dmabuf() + * which transfers ownership of the VFIO device fd to the + * DMABUF, and so the VFIO device is held open because the + * VMA's vm_file (DMABUF) is open. + * + * Since vfio_pci_dma_buf_cleanup() cannot have happened, + * vdev must be valid; we can take memory_lock. + */ + vdev =3D READ_ONCE(priv->vdev); + + scoped_guard(rwsem_read, &vdev->memory_lock) { + if (!priv->revoked) { + int pres =3D vfio_pci_dma_buf_find_pfn(priv, vma, + vmf->address, + order, &pfn); + + if (pres =3D=3D 0) + ret =3D vfio_pci_vmf_insert_pfn(vdev, vmf, + pfn, order); + else if (pres =3D=3D -EAGAIN) + ret =3D VM_FAULT_FALLBACK; + } =20 - dev_dbg_ratelimited(&vdev->pdev->dev, - "%s(,order =3D %d) BAR %ld page offset 0x%lx: 0x%x\n", - __func__, order, - vma->vm_pgoff >> - (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT), - pgoff, (unsigned int)ret); + dev_dbg_ratelimited(&vdev->pdev->dev, + "%s(order =3D %d) PFN 0x%lx, VA 0x%lx, pgoff 0x%lx: 0x%x\n", + __func__, order, pfn, vmf->address, + vma->vm_pgoff, (unsigned int)ret); + } =20 return ret; } @@ -1726,7 +1733,7 @@ int vfio_pci_core_mmap(struct vfio_device *core_vdev,= struct vm_area_struct *vma container_of(core_vdev, struct vfio_pci_core_device, vdev); struct pci_dev *pdev =3D vdev->pdev; unsigned int index; - u64 phys_len, req_len, pgoff, req_start; + u64 phys_len, req_len; int ret; =20 index =3D vma->vm_pgoff >> (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT); @@ -1753,11 +1760,9 @@ int vfio_pci_core_mmap(struct vfio_device *core_vdev= , struct vm_area_struct *vma =20 phys_len =3D PAGE_ALIGN(pci_resource_len(pdev, index)); req_len =3D vma->vm_end - vma->vm_start; - pgoff =3D vma->vm_pgoff & - ((1U << (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT)) - 1); - req_start =3D pgoff << PAGE_SHIFT; + vma->vm_pgoff &=3D VFIO_PCI_OFFSET_MASK >> PAGE_SHIFT; =20 - if (req_start + req_len > phys_len) + if ((vma->vm_pgoff << PAGE_SHIFT) + req_len > phys_len) return -EINVAL; =20 /* @@ -1768,7 +1773,20 @@ int vfio_pci_core_mmap(struct vfio_device *core_vdev= , struct vm_area_struct *vma if (ret) return ret; =20 - vma->vm_private_data =3D vdev; + /* + * Create a DMABUF with a single range corresponding to this + * mapping, and wire it into vma->vm_private_data. The VMA's + * vm_file becomes that of the DMABUF, and the DMABUF takes + * ownership of the VFIO device file (put upon DMABUF + * release). This maintains the behaviour of a live VMA + * mapping holding the VFIO device file open. + */ + ret =3D vfio_pci_core_mmap_prep_dmabuf(vdev, vma, + pci_resource_start(pdev, index), + req_len, index); + if (ret) + return ret; + vma->vm_page_prot =3D pgprot_noncached(vma->vm_page_prot); vma->vm_page_prot =3D pgprot_decrypted(vma->vm_page_prot); =20 diff --git a/drivers/vfio/pci/vfio_pci_dmabuf.c b/drivers/vfio/pci/vfio_pci= _dmabuf.c index 3554afbc8ebc..a12432825e5e 100644 --- a/drivers/vfio/pci/vfio_pci_dmabuf.c +++ b/drivers/vfio/pci/vfio_pci_dmabuf.c @@ -9,6 +9,7 @@ =20 MODULE_IMPORT_NS("DMA_BUF"); =20 +#ifdef CONFIG_VFIO_PCI_DMABUF static int vfio_pci_dma_buf_attach(struct dma_buf *dmabuf, struct dma_buf_attachment *attachment) { @@ -25,6 +26,7 @@ static int vfio_pci_dma_buf_attach(struct dma_buf *dmabuf, =20 return 0; } +#endif /* CONFIG_VFIO_PCI_DMABUF */ =20 static void vfio_pci_dma_buf_done(struct kref *kref) { @@ -89,7 +91,9 @@ static void vfio_pci_dma_buf_release(struct dma_buf *dmab= uf) } =20 static const struct dma_buf_ops vfio_pci_dmabuf_ops =3D { +#ifdef CONFIG_VFIO_PCI_DMABUF .attach =3D vfio_pci_dma_buf_attach, +#endif .map_dma_buf =3D vfio_pci_dma_buf_map, .unmap_dma_buf =3D vfio_pci_dma_buf_unmap, .release =3D vfio_pci_dma_buf_release, @@ -245,6 +249,7 @@ static int vfio_pci_dmabuf_export(struct vfio_pci_core_= device *vdev, return 0; } =20 +#ifdef CONFIG_VFIO_PCI_DMABUF /* * This is a temporary "private interconnect" between VFIO DMABUF and iomm= ufd. * It allows the two co-operating drivers to exchange the physical address= of @@ -453,6 +458,7 @@ int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_= device *vdev, u32 flags, kfree(dma_ranges); return ret; } +#endif /* CONFIG_VFIO_PCI_DMABUF */ =20 int vfio_pci_core_mmap_prep_dmabuf(struct vfio_pci_core_device *vdev, struct vm_area_struct *vma, @@ -530,6 +536,10 @@ void vfio_pci_dma_buf_move(struct vfio_pci_core_device= *vdev, bool revoked) struct vfio_pci_dma_buf *tmp; =20 lockdep_assert_held_write(&vdev->memory_lock); + /* + * Holding memory_lock ensures a racing VMA fault observes + * priv->revoked properly. + */ =20 list_for_each_entry_safe(priv, tmp, &vdev->dmabufs, dmabufs_elm) { if (!get_file_active(&priv->dmabuf->file)) @@ -547,6 +557,8 @@ void vfio_pci_dma_buf_move(struct vfio_pci_core_device = *vdev, bool revoked) if (revoked) { kref_put(&priv->kref, vfio_pci_dma_buf_done); wait_for_completion(&priv->comp); + unmap_mapping_range(priv->dmabuf->file->f_mapping, + 0, priv->size, 1); } else { /* * Kref is initialize again, because when revoke @@ -594,6 +606,8 @@ void vfio_pci_dma_buf_cleanup(struct vfio_pci_core_devi= ce *vdev) if (!was_revoked) { kref_put(&priv->kref, vfio_pci_dma_buf_done); wait_for_completion(&priv->comp); + unmap_mapping_range(priv->dmabuf->file->f_mapping, + 0, priv->size, 1); } vfio_device_put_registration(&vdev->vdev); fput(priv->dmabuf->file); diff --git a/drivers/vfio/pci/vfio_pci_priv.h b/drivers/vfio/pci/vfio_pci_p= riv.h index 3cff1b7eb47b..868a54ba482c 100644 --- a/drivers/vfio/pci/vfio_pci_priv.h +++ b/drivers/vfio/pci/vfio_pci_priv.h @@ -137,13 +137,13 @@ int vfio_pci_core_mmap_prep_dmabuf(struct vfio_pci_co= re_device *vdev, struct vm_area_struct *vma, u64 phys_start, u64 req_len, unsigned int res_index); +void vfio_pci_dma_buf_cleanup(struct vfio_pci_core_device *vdev); +void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev, bool revoked= ); =20 #ifdef CONFIG_VFIO_PCI_DMABUF int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 f= lags, struct vfio_device_feature_dma_buf __user *arg, size_t argsz); -void vfio_pci_dma_buf_cleanup(struct vfio_pci_core_device *vdev); -void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev, bool revoked= ); #else static inline int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags, @@ -152,13 +152,6 @@ vfio_pci_core_feature_dma_buf(struct vfio_pci_core_dev= ice *vdev, u32 flags, { return -ENOTTY; } -static inline void vfio_pci_dma_buf_cleanup(struct vfio_pci_core_device *v= dev) -{ -} -static inline void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev, - bool revoked) -{ -} #endif =20 #endif --=20 2.47.3 From nobody Tue Jun 16 04:59:01 2026 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E949939E185; Thu, 16 Apr 2026 13:18:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=67.231.153.30 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776345539; cv=none; b=FGplEm2sCGy9yLj9ImeOGEs/enXUdWKnFudexpKoBMXrdaguA6+YnF5+d1R61D54JbBkkP3S3hlqyp24nbTY+VRcciA6geiubMh9ULqD2b0iIDGdutC1hdGDDHkLgDMaYzjEn7nTamLNrr++d+C5gOCveN5ZOoYGUnfxutzIC/g= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776345539; c=relaxed/simple; bh=9bJhLRPbBath4yLXd4IyGt3YpmfQ4OA4d3IgqdC3pUc=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=tYzKVnpevdmHamWlssBCIGZ+AIjT5X/Vtzw4+no4rNa7KXryMA0izNIK0ZJkvpZnm/EkAr+Q6IV7Nh1K5yx54j2WXxGi0RfoS+O8OyyRQA9kXnGZQbaS7q1ihoMslQmR62TovFM2C2NQ4SHtVHPDear3EX47JBa2L5m+uwRn/KY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com; spf=pass smtp.mailfrom=meta.com; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b=E9q2FNnQ; arc=none smtp.client-ip=67.231.153.30 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=meta.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b="E9q2FNnQ" Received: from pps.filterd (m0528006.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.18.1.11/8.18.1.11) with ESMTP id 63G4Q9pW3203131; Thu, 16 Apr 2026 06:18:45 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=s2048-2025-q2; bh=iTNAuwSiBRgPAjZmQLfOkizde4HCQ7hisKa2aJG19b4=; b=E9q2FNnQLmSa quxk3DncgPKrbkIYi9lB3nr2phNOoFkscrZTlnxACXmFyI2NwAPjcf/iLlRjj8sE TAULw7QlVh0Qzlb+bkaNCXNMaVd7LdswBMdJk5nv5INkHV9UfBDjD9ZnOaov4o4Z v8vuoYod7YMiErd/AV8adAb/uqE0Xekxkj5hmCZx2l3jopOsMc4LxV8iIAb+1PL4 ufXFon8jCcUMu4M1HkSSVnpaw/5czBWp87t3Pyxh2v8Djet09RUsa+unyW2v+0sb zvaozCTHZf3c5hlbs8NDxLf/SAH/XkJwuS/02rb2XIFCkGTwC5wZqmtKcjamnL3L Z1hWNPrL9Q== Received: from maileast.thefacebook.com ([163.114.135.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 4dh8551x0w-2 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT); Thu, 16 Apr 2026 06:18:45 -0700 (PDT) Received: from localhost (2620:10d:c0a8:1c::1b) by mail.thefacebook.com (2620:10d:c0a9:6f::237c) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.2562.37; Thu, 16 Apr 2026 13:18:44 +0000 From: Matt Evans To: Alex Williamson , Leon Romanovsky , Jason Gunthorpe , Alex Mastro , =?UTF-8?q?Christian=20K=C3=B6nig?= CC: Mahmoud Adam , David Matlack , =?UTF-8?q?Bj=C3=B6rn=20T=C3=B6pel?= , Sumit Semwal , Kevin Tian , Ankit Agrawal , Pranjal Shrivastava , Alistair Popple , Vivek Kasireddy , , , , , Subject: [PATCH 5/9] vfio/pci: Provide a user-facing name for BAR mappings Date: Thu, 16 Apr 2026 06:17:48 -0700 Message-ID: <20260416131815.2729131-6-mattev@meta.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260416131815.2729131-1-mattev@meta.com> References: <20260416131815.2729131-1-mattev@meta.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Authority-Analysis: v=2.4 cv=Fuw1OWrq c=1 sm=1 tr=0 ts=69e0e1b5 cx=c_pps a=MfjaFnPeirRr97d5FC5oHw==:117 a=MfjaFnPeirRr97d5FC5oHw==:17 a=A5OVakUREuEA:10 a=VkNPw1HP01LnGYTKEx00:22 a=7x6HtfJdh03M6CCDgxCd:22 a=kkcUborcUVj0H7zxAXTl:22 a=VabnemYjAAAA:8 a=3OuInECnMAfxEXBjkJUA:9 a=gKebqoRLp9LExxC7YDUY:22 X-Proofpoint-ORIG-GUID: KtX4RQyQeJBe1k-4hJ-K_Qc_DBJP82zF X-Proofpoint-GUID: KtX4RQyQeJBe1k-4hJ-K_Qc_DBJP82zF X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwNDE2MDEyNiBTYWx0ZWRfXyIzy/ycBhTVA vgHGBcFtAf77DgN8K4R6LrcYubNhG87+p9fIC9r4rPVeMA9bHO3AHATEDhirQvUHlA9GW7ojm5g zJ6aS0D1QhrVkaO8TRCpRsr1v6N4/JM4IOQqw1GVwLOI5AgpiGMmYvTE9iK9bTc9UAQSwMN+l2o 6j2qc4DG7Q3orzdJbQtSBy/gviSj6r7izI9LSHiqaH+vGz6bxA4UZ+vCz/feu4R4mokVAb/Nam1 XLhARODJwUhHLqufnpiZ57c5V+Cblju/Y1e87UxV80MwtqIwwL0z2/pPuxoqDbENPIs6pwAzS3P BWIAdEViJ3IZCNHY0lLcuypYu7X5GisqE9lUKVdzZnVLCAmHDeZ/QMzAPdZybdKeN5sWwZm5B0w mPuHlOcgq127JG5S8vxaWaLQiF3ZFZBwPxAM+K80YHiMbK4FD41M29SJLmbfepqqqAQaYW87YRI TMk96/pRjwKbdTtrueg== X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1143,Hydra:6.1.51,FMLib:17.12.100.49 definitions=2026-04-16_03,2026-04-16_02,2025-10-01_01 Content-Type: text/plain; charset="utf-8" Since converting BAR mmap()s to using DMABUFs, we lose the original device path in /proc//maps, lsof, etc. Generate a debug-oriented synthetic 'filename' based on the cdev, plus BDF, plus resource index. This applies only to BAR mappings via the VFIO device fd, as explicitly-exported DMABUFs are named by userspace via the DMA_BUF_SET_NAME ioctl. Signed-off-by: Matt Evans Reviewed-by: Jason Gunthorpe --- drivers/vfio/pci/vfio_pci_dmabuf.c | 27 +++++++++++++++++++++++++-- 1 file changed, 25 insertions(+), 2 deletions(-) diff --git a/drivers/vfio/pci/vfio_pci_dmabuf.c b/drivers/vfio/pci/vfio_pci= _dmabuf.c index a12432825e5e..04c7733fe712 100644 --- a/drivers/vfio/pci/vfio_pci_dmabuf.c +++ b/drivers/vfio/pci/vfio_pci_dmabuf.c @@ -4,6 +4,7 @@ #include #include #include +#include =20 #include "vfio_pci_priv.h" =20 @@ -467,6 +468,7 @@ int vfio_pci_core_mmap_prep_dmabuf(struct vfio_pci_core= _device *vdev, { struct vfio_pci_dma_buf *priv; const unsigned int nr_ranges =3D 1; + char *bufname; int ret; =20 priv =3D kzalloc_obj(*priv); @@ -479,6 +481,20 @@ int vfio_pci_core_mmap_prep_dmabuf(struct vfio_pci_cor= e_device *vdev, goto err_free_priv; } =20 + bufname =3D kzalloc(DMA_BUF_NAME_LEN, GFP_KERNEL); + if (!bufname) { + ret =3D -ENOMEM; + goto err_free_phys; + } + + /* + * Maximum size of the friendly debug name is + * vfio1234567890:ffff:ff:3f.7-9 =3D 30, which fits within + * DMA_BUF_NAME_LEN. + */ + snprintf(bufname, DMA_BUF_NAME_LEN, "%s:%s/%x", + dev_name(&vdev->vdev.device), pci_name(vdev->pdev), res_index); + /* * The mmap() request's vma->vm_offs might be non-zero, but * the DMABUF is created from _offset zero_ of the BAR. The @@ -501,7 +517,7 @@ int vfio_pci_core_mmap_prep_dmabuf(struct vfio_pci_core= _device *vdev, priv->provider =3D pcim_p2pdma_provider(vdev->pdev, res_index); if (!priv->provider) { ret =3D -EINVAL; - goto err_free_phys; + goto err_free_name; } =20 priv->phys_vec[0].paddr =3D phys_start; @@ -509,7 +525,7 @@ int vfio_pci_core_mmap_prep_dmabuf(struct vfio_pci_core= _device *vdev, =20 ret =3D vfio_pci_dmabuf_export(vdev, priv, O_CLOEXEC | O_RDWR); if (ret) - goto err_free_phys; + goto err_free_name; =20 /* * The VMA gets the DMABUF file so that other users can locate @@ -521,8 +537,15 @@ int vfio_pci_core_mmap_prep_dmabuf(struct vfio_pci_cor= e_device *vdev, vma->vm_file =3D priv->dmabuf->file; vma->vm_private_data =3D priv; =20 + spin_lock(&priv->dmabuf->name_lock); + kfree(priv->dmabuf->name); + priv->dmabuf->name =3D bufname; + spin_unlock(&priv->dmabuf->name_lock); + return 0; =20 +err_free_name: + kfree(bufname); err_free_phys: kfree(priv->phys_vec); err_free_priv: --=20 2.47.3 From nobody Tue Jun 16 04:59:01 2026 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DDA6D3CBE6C; Thu, 16 Apr 2026 13:19:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=67.231.153.30 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776345543; cv=none; b=ZyFqHy/ooosMI1xa46KdK9t53A1t8BzEpeAWqcwVZqCPG5efzxjjKF8hCV1K67i6j9o1cFnI7rs6SixRnLga91ZeP5HM3mt80bTTu2gHj6Gg0ex7FpAcifjrNVlfu5bok9+zzlzT1lHqMIi1sWOUBbc/TAOTRqmtU2AKCqAvUTA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776345543; c=relaxed/simple; bh=zUO2lHul+QkL8YX/jCMMcgivZ8YGHKOPtc3j2JLJApQ=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=fpV4zjVORyTVWfx4XsBWEQ8F9muk1AQgDdSBXLc+v6tXT/J4t1Pdeyb74RyAM64bxc3a0V+iN8gEQnBU/AEkugiHD4AujEmoZAu1vnopfZ5WzSg0QR6hkR7zusPM1f1wrduvyua4LMAoOibeT0VtDtBOyWRAuM0rmyVzUXEsWgY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com; spf=pass smtp.mailfrom=meta.com; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b=NXqox/Fq; arc=none smtp.client-ip=67.231.153.30 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=meta.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b="NXqox/Fq" Received: from pps.filterd (m0148460.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.18.1.11/8.18.1.11) with ESMTP id 63G3t62K2770074; Thu, 16 Apr 2026 06:18:47 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=s2048-2025-q2; bh=lshZxh7nH27kspEDqtsBaSxImoGtUZtyrZMu3Gs/EqI=; b=NXqox/Fq1AiL Yr6K1CVeGCbkTfusLATx7xZeiRruLoHjvLZf39g9Th1SCvunXKX36tjgxCn0rLl3 MZINtJAGQAt/hiUA/lUX6nmOtjLX7nRv+MV2jXys0zoB/OPl7OOctPysLbsAV9Qm xgjZC4N711G07o+i3kg0qP8q7lejeEOIZDcP0lpRL214SEo/cS9+aosmrK7DtHcw xIAQJEFaXOP6ddg8rlSsHHAuuJ8enQk5HrLtXMVwWrGsyrJfLqCkMz2rIEN2oZTr KJeJUivj+ybTnGCikrNlwck3Ya5TQpplXiW/AFDrcSdoaPuXaE32J0F+X6m0epge 9X1m82IOyg== Received: from maileast.thefacebook.com ([163.114.135.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 4dh84v1ydq-2 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT); Thu, 16 Apr 2026 06:18:47 -0700 (PDT) Received: from localhost (2620:10d:c0a8:1b::2d) by mail.thefacebook.com (2620:10d:c0a9:6f::8fd4) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.2562.37; Thu, 16 Apr 2026 13:18:46 +0000 From: Matt Evans To: Alex Williamson , Leon Romanovsky , Jason Gunthorpe , Alex Mastro , =?UTF-8?q?Christian=20K=C3=B6nig?= CC: Mahmoud Adam , David Matlack , =?UTF-8?q?Bj=C3=B6rn=20T=C3=B6pel?= , Sumit Semwal , Kevin Tian , Ankit Agrawal , Pranjal Shrivastava , Alistair Popple , Vivek Kasireddy , , , , , Subject: [PATCH 6/9] vfio/pci: Clean up BAR zap and revocation Date: Thu, 16 Apr 2026 06:17:49 -0700 Message-ID: <20260416131815.2729131-7-mattev@meta.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260416131815.2729131-1-mattev@meta.com> References: <20260416131815.2729131-1-mattev@meta.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwNDE2MDEyNyBTYWx0ZWRfXxfzGQe7TMxfc 6Sb7ce7ON/CawW+jFMEJNneiz+idXOXjg9Va2nOEruzXG2mLaQ2X4Crb/Y1kGldJQKE5xc/7VY1 rMPDJ2/Dxac83vAwg4LV0piLz63DAnSPy9EagNmbFQjSisC2N4Tvtz8rFW4JRn+caYeOS/Qik9E A/zFHjyaDtgxC9qCaj0N0CDrGtRK0EUiFntRto0KY+nEy5vqRacNWt+9CxVHrCk+vHuD5+hqtbz iFkAwDXnvk+ny5qTn4Lwi9tshxVZd78/DzFhhtc7D6XMVSG7aL2l6adUEKgpn0WTUChLwanNzJQ Ks6ZDLzQkpavy5SnFWzmrSiH6/e+u3LNgeNhmiiP8EQQujjDvcKY2UkwNL5PLzEsf45fTLJf3L3 DeVfnCbIDHWauQyOuioLp7Ys99can+aKIobj8o8bzJm9uLy147NS07O2T4WtBW1WM94k02i16wD UhvWT3twKntM3Idi+FQ== X-Proofpoint-GUID: sFH0ApgAOq6UN1Ugul8bC7lSe2FepIIz X-Authority-Analysis: v=2.4 cv=NfLWEWD4 c=1 sm=1 tr=0 ts=69e0e1b7 cx=c_pps a=MfjaFnPeirRr97d5FC5oHw==:117 a=MfjaFnPeirRr97d5FC5oHw==:17 a=A5OVakUREuEA:10 a=VkNPw1HP01LnGYTKEx00:22 a=7x6HtfJdh03M6CCDgxCd:22 a=JnKecZnUtZousrUlYMGU:22 a=VabnemYjAAAA:8 a=c4k9Xcs0TiChc187yiQA:9 a=O8hF6Hzn-FEA:10 a=gKebqoRLp9LExxC7YDUY:22 X-Proofpoint-ORIG-GUID: sFH0ApgAOq6UN1Ugul8bC7lSe2FepIIz X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1143,Hydra:6.1.51,FMLib:17.12.100.49 definitions=2026-04-16_03,2026-04-16_02,2025-10-01_01 Content-Type: text/plain; charset="utf-8" Previously, vfio_pci_zap_bars() (and the wrapper vfio_pci_zap_and_down_write_memory_lock()) calls were paired with calls of vfio_pci_dma_buf_move(). This commit replaces them a unified new function, vfio_pci_zap_revoke_bars() containing both the vfio_pci_dma_buf_move() and the unmap_mapping_range(), making it harder for callers to omit one. It adds a wrapper, vfio_pci_lock_zap_revoke_bars(), which takes the write memory_lock before zapping, and adds a new vfio_pci_unrevoke_bars() for the re-enable path. However, as of "vfio/pci: Convert BAR mmap() to use a DMABUF" the unmap_mapping_range() to zap is entirely redundant for plain vfio-pci, since the DMABUFs used for BAR mappings already zap PTEs when the vfio_pci_dma_buf_move() occurs. One exception remains as a FIXME: in nvgrace-gpu, some BAR VMAs conditionally use custom vm_ops, which have not moved to be backed by DMABUFs. If these BARs are mmap()ed, the vdev enables the existing behaviour of unmap_mapping_range() for the device fd address space. Signed-off-by: Matt Evans --- drivers/vfio/pci/nvgrace-gpu/main.c | 5 +++ drivers/vfio/pci/vfio_pci_config.c | 30 ++++++-------- drivers/vfio/pci/vfio_pci_core.c | 62 +++++++++++++++++++---------- drivers/vfio/pci/vfio_pci_priv.h | 3 +- include/linux/vfio_pci_core.h | 1 + 5 files changed, 62 insertions(+), 39 deletions(-) diff --git a/drivers/vfio/pci/nvgrace-gpu/main.c b/drivers/vfio/pci/nvgrace= -gpu/main.c index c1df437754f9..5304d15b9a2b 100644 --- a/drivers/vfio/pci/nvgrace-gpu/main.c +++ b/drivers/vfio/pci/nvgrace-gpu/main.c @@ -358,6 +358,8 @@ static int nvgrace_gpu_mmap(struct vfio_device *core_vd= ev, struct nvgrace_gpu_pci_core_device *nvdev =3D container_of(core_vdev, struct nvgrace_gpu_pci_core_device, core_device.vdev); + struct vfio_pci_core_device *vdev =3D + container_of(core_vdev, struct vfio_pci_core_device, vdev); struct mem_region *memregion; u64 req_len, pgoff, end; unsigned int index; @@ -368,6 +370,9 @@ static int nvgrace_gpu_mmap(struct vfio_device *core_vd= ev, if (!memregion) return vfio_pci_core_mmap(core_vdev, vma); =20 + /* Non-DMABUF BAR mappings need an extra zap */ + vdev->bar_needs_zap =3D true; + /* * Request to mmap the BAR. Map to the CPU accessible memory on the * GPU using the memory information gathered from the system ACPI diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci= _config.c index a10ed733f0e3..8bfab0da481c 100644 --- a/drivers/vfio/pci/vfio_pci_config.c +++ b/drivers/vfio/pci/vfio_pci_config.c @@ -590,12 +590,10 @@ static int vfio_basic_config_write(struct vfio_pci_co= re_device *vdev, int pos, virt_mem =3D !!(le16_to_cpu(*virt_cmd) & PCI_COMMAND_MEMORY); new_mem =3D !!(new_cmd & PCI_COMMAND_MEMORY); =20 - if (!new_mem) { - vfio_pci_zap_and_down_write_memory_lock(vdev); - vfio_pci_dma_buf_move(vdev, true); - } else { + if (!new_mem) + vfio_pci_lock_zap_revoke_bars(vdev); + else down_write(&vdev->memory_lock); - } =20 /* * If the user is writing mem/io enable (new_mem/io) and we @@ -631,7 +629,7 @@ static int vfio_basic_config_write(struct vfio_pci_core= _device *vdev, int pos, *virt_cmd |=3D cpu_to_le16(new_cmd & mask); =20 if (__vfio_pci_memory_enabled(vdev)) - vfio_pci_dma_buf_move(vdev, false); + vfio_pci_unrevoke_bars(vdev); up_write(&vdev->memory_lock); } =20 @@ -712,16 +710,14 @@ static int __init init_pci_cap_basic_perm(struct perm= _bits *perm) static void vfio_lock_and_set_power_state(struct vfio_pci_core_device *vde= v, pci_power_t state) { - if (state >=3D PCI_D3hot) { - vfio_pci_zap_and_down_write_memory_lock(vdev); - vfio_pci_dma_buf_move(vdev, true); - } else { + if (state >=3D PCI_D3hot) + vfio_pci_lock_zap_revoke_bars(vdev); + else down_write(&vdev->memory_lock); - } =20 vfio_pci_set_power_state(vdev, state); if (__vfio_pci_memory_enabled(vdev)) - vfio_pci_dma_buf_move(vdev, false); + vfio_pci_unrevoke_bars(vdev); up_write(&vdev->memory_lock); } =20 @@ -908,11 +904,10 @@ static int vfio_exp_config_write(struct vfio_pci_core= _device *vdev, int pos, &cap); =20 if (!ret && (cap & PCI_EXP_DEVCAP_FLR)) { - vfio_pci_zap_and_down_write_memory_lock(vdev); - vfio_pci_dma_buf_move(vdev, true); + vfio_pci_lock_zap_revoke_bars(vdev); pci_try_reset_function(vdev->pdev); if (__vfio_pci_memory_enabled(vdev)) - vfio_pci_dma_buf_move(vdev, false); + vfio_pci_unrevoke_bars(vdev); up_write(&vdev->memory_lock); } } @@ -993,11 +988,10 @@ static int vfio_af_config_write(struct vfio_pci_core_= device *vdev, int pos, &cap); =20 if (!ret && (cap & PCI_AF_CAP_FLR) && (cap & PCI_AF_CAP_TP)) { - vfio_pci_zap_and_down_write_memory_lock(vdev); - vfio_pci_dma_buf_move(vdev, true); + vfio_pci_lock_zap_revoke_bars(vdev); pci_try_reset_function(vdev->pdev); if (__vfio_pci_memory_enabled(vdev)) - vfio_pci_dma_buf_move(vdev, false); + vfio_pci_unrevoke_bars(vdev); up_write(&vdev->memory_lock); } } diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_c= ore.c index c00a61d61250..464b63585bef 100644 --- a/drivers/vfio/pci/vfio_pci_core.c +++ b/drivers/vfio/pci/vfio_pci_core.c @@ -319,8 +319,7 @@ static int vfio_pci_runtime_pm_entry(struct vfio_pci_co= re_device *vdev, * The vdev power related flags are protected with 'memory_lock' * semaphore. */ - vfio_pci_zap_and_down_write_memory_lock(vdev); - vfio_pci_dma_buf_move(vdev, true); + vfio_pci_lock_zap_revoke_bars(vdev); =20 if (vdev->pm_runtime_engaged) { up_write(&vdev->memory_lock); @@ -406,7 +405,7 @@ static void vfio_pci_runtime_pm_exit(struct vfio_pci_co= re_device *vdev) down_write(&vdev->memory_lock); __vfio_pci_runtime_pm_exit(vdev); if (__vfio_pci_memory_enabled(vdev)) - vfio_pci_dma_buf_move(vdev, false); + vfio_pci_unrevoke_bars(vdev); up_write(&vdev->memory_lock); } =20 @@ -1229,7 +1228,7 @@ static int vfio_pci_ioctl_reset(struct vfio_pci_core_= device *vdev, if (!vdev->reset_works) return -EINVAL; =20 - vfio_pci_zap_and_down_write_memory_lock(vdev); + vfio_pci_lock_zap_revoke_bars(vdev); =20 /* * This function can be invoked while the power state is non-D0. If @@ -1242,10 +1241,9 @@ static int vfio_pci_ioctl_reset(struct vfio_pci_core= _device *vdev, */ vfio_pci_set_power_state(vdev, PCI_D0); =20 - vfio_pci_dma_buf_move(vdev, true); ret =3D pci_try_reset_function(vdev->pdev); if (__vfio_pci_memory_enabled(vdev)) - vfio_pci_dma_buf_move(vdev, false); + vfio_pci_unrevoke_bars(vdev); up_write(&vdev->memory_lock); =20 return ret; @@ -1613,20 +1611,44 @@ ssize_t vfio_pci_core_write(struct vfio_device *cor= e_vdev, const char __user *bu } EXPORT_SYMBOL_GPL(vfio_pci_core_write); =20 -static void vfio_pci_zap_bars(struct vfio_pci_core_device *vdev) +static void vfio_pci_zap_revoke_bars(struct vfio_pci_core_device *vdev) { - struct vfio_device *core_vdev =3D &vdev->vdev; - loff_t start =3D VFIO_PCI_INDEX_TO_OFFSET(VFIO_PCI_BAR0_REGION_INDEX); - loff_t end =3D VFIO_PCI_INDEX_TO_OFFSET(VFIO_PCI_ROM_REGION_INDEX); - loff_t len =3D end - start; + lockdep_assert_held_write(&vdev->memory_lock); + vfio_pci_dma_buf_move(vdev, true); =20 - unmap_mapping_range(core_vdev->inode->i_mapping, start, len, true); + /* + * All VFIO PCI BARs are backed by DMABUFs, with the current + * exception of the nvgrace-gpu device which uses its own + * vm_ops for a subset of BARs. For this, BAR mappings are + * still made in the vdev's address_space, and a zap is + * required. The tracking is crude, and will (harmlessly) + * continue to zap if the special BAR is unmapped, but that + * behaviour isn't the common case. + * + * FIXME: This can go away if the special nvgrace-gpu mapping + * is converted to use DMABUF. + */ + if (vdev->bar_needs_zap) { + struct vfio_device *core_vdev =3D &vdev->vdev; + loff_t start =3D VFIO_PCI_INDEX_TO_OFFSET(VFIO_PCI_BAR0_REGION_INDEX); + loff_t end =3D VFIO_PCI_INDEX_TO_OFFSET(VFIO_PCI_ROM_REGION_INDEX); + loff_t len =3D end - start; + + unmap_mapping_range(core_vdev->inode->i_mapping, + start, len, true); + } } =20 -void vfio_pci_zap_and_down_write_memory_lock(struct vfio_pci_core_device *= vdev) +void vfio_pci_lock_zap_revoke_bars(struct vfio_pci_core_device *vdev) { down_write(&vdev->memory_lock); - vfio_pci_zap_bars(vdev); + vfio_pci_zap_revoke_bars(vdev); +} + +void vfio_pci_unrevoke_bars(struct vfio_pci_core_device *vdev) +{ + lockdep_assert_held_write(&vdev->memory_lock); + vfio_pci_dma_buf_move(vdev, false); } =20 u16 vfio_pci_memory_lock_and_enable(struct vfio_pci_core_device *vdev) @@ -2480,9 +2502,10 @@ static int vfio_pci_dev_set_hot_reset(struct vfio_de= vice_set *dev_set, } =20 /* - * Take the memory write lock for each device and zap BAR - * mappings to prevent the user accessing the device while in - * reset. Locking multiple devices is prone to deadlock, + * Take the memory write lock for each device and + * zap/revoke BAR mappings to prevent the user (or + * peers) accessing the device while in reset. + * Locking multiple devices is prone to deadlock, * runaway and unwind if we hit contention. */ if (!down_write_trylock(&vdev->memory_lock)) { @@ -2490,8 +2513,7 @@ static int vfio_pci_dev_set_hot_reset(struct vfio_dev= ice_set *dev_set, break; } =20 - vfio_pci_dma_buf_move(vdev, true); - vfio_pci_zap_bars(vdev); + vfio_pci_zap_revoke_bars(vdev); } =20 if (!list_entry_is_head(vdev, @@ -2521,7 +2543,7 @@ static int vfio_pci_dev_set_hot_reset(struct vfio_dev= ice_set *dev_set, list_for_each_entry_from_reverse(vdev, &dev_set->device_list, vdev.dev_set_list) { if (vdev->vdev.open_count && __vfio_pci_memory_enabled(vdev)) - vfio_pci_dma_buf_move(vdev, false); + vfio_pci_unrevoke_bars(vdev); up_write(&vdev->memory_lock); } =20 diff --git a/drivers/vfio/pci/vfio_pci_priv.h b/drivers/vfio/pci/vfio_pci_p= riv.h index 868a54ba482c..a8edbee6ce56 100644 --- a/drivers/vfio/pci/vfio_pci_priv.h +++ b/drivers/vfio/pci/vfio_pci_priv.h @@ -82,7 +82,8 @@ void vfio_config_free(struct vfio_pci_core_device *vdev); int vfio_pci_set_power_state(struct vfio_pci_core_device *vdev, pci_power_t state); =20 -void vfio_pci_zap_and_down_write_memory_lock(struct vfio_pci_core_device *= vdev); +void vfio_pci_lock_zap_revoke_bars(struct vfio_pci_core_device *vdev); +void vfio_pci_unrevoke_bars(struct vfio_pci_core_device *vdev); u16 vfio_pci_memory_lock_and_enable(struct vfio_pci_core_device *vdev); void vfio_pci_memory_unlock_and_restore(struct vfio_pci_core_device *vdev, u16 cmd); diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h index 2ea4e773c121..c1cd67741125 100644 --- a/include/linux/vfio_pci_core.h +++ b/include/linux/vfio_pci_core.h @@ -127,6 +127,7 @@ struct vfio_pci_core_device { bool needs_pm_restore:1; bool pm_intx_masked:1; bool pm_runtime_engaged:1; + bool bar_needs_zap:1; struct pci_saved_state *pci_saved_state; struct pci_saved_state *pm_save; int ioeventfds_nr; --=20 2.47.3 From nobody Tue Jun 16 04:59:01 2026 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 963713CD8CD; Thu, 16 Apr 2026 13:19:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=67.231.153.30 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776345549; cv=none; b=KyoK4+3QEpORngduervsNuqF4M6TYmEiG2xJAXaIwCRH0Zq0EwtXOq7LHypNOEMPNsoygAVOqTxs67ZhMAdeTpmVQlHA1wRDVrdqju4gduWm2i9kV80z3+mIRtvadgBnuubK4Xh6YTtGa0XLvI2bfBy5cCjm1Udsg4C9Ssnxl3o= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776345549; c=relaxed/simple; bh=Py81t9gRcyVUn+lsnKR5DaErrwZGJfKWsIomqtXuuJs=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=W4lE+3+Z7U5I7aO2aenz7GgC2MaY+yPnjpeb4zl4DTxadKM1ICjIHKbeOXDtIKcUqBZUlLHh7boNujP11In8wgB21OMhfl3xzaGdyaeykx6NkR4u6K2I8IXH6yFEpaMrNB2kZ94rhWxxUjxXlb4cefMthAwPWNoklAoeT1Aih8s= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com; spf=pass smtp.mailfrom=meta.com; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b=OP2wVGOM; arc=none smtp.client-ip=67.231.153.30 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=meta.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b="OP2wVGOM" Received: from pps.filterd (m0109332.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.18.1.11/8.18.1.11) with ESMTP id 63FM0WG32967981; Thu, 16 Apr 2026 06:18:50 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=s2048-2025-q2; bh=kXoTGsoetkek2MAn+6LONZFFTzQ7WxVN5lvdEQHBxVM=; b=OP2wVGOMM/+Z Osvc6nvoWnS5BK+tCkG5TnDs6+ifwPRcQ1d0kKO/u4C4ul7MaxOAnZ1qmPl63EFC UN/dqbldWbfqrhBpyU0WnTZB6dQoPYN+NAcMwYyzYcxYCu0nZpV4ANWH2xYiW/3J ZvzA8iOG2EuYC8zsN3Mp2hEizUrpyR8IYJ96kgqWs3jZE1jF5pBhiQIKFpY9I0pk Pt1rRfhjXaHSq7bA5FB4w9pfpfUDI8wOpxCbjkuoN8QiL4corIWOPeQyBToZubHS 8F2x64cTt6uSugktelH8SduInIjzWvBlxeXQQax3zduVdcAjano2PSQqn1kGbzUn EMQ/CgLL3w== Received: from maileast.thefacebook.com ([163.114.135.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 4dh86521k7-2 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT); Thu, 16 Apr 2026 06:18:50 -0700 (PDT) Received: from localhost (2620:10d:c0a8:1b::2d) by mail.thefacebook.com (2620:10d:c0a9:6f::237c) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.2562.37; Thu, 16 Apr 2026 13:18:49 +0000 From: Matt Evans To: Alex Williamson , Leon Romanovsky , Jason Gunthorpe , Alex Mastro , =?UTF-8?q?Christian=20K=C3=B6nig?= CC: Mahmoud Adam , David Matlack , =?UTF-8?q?Bj=C3=B6rn=20T=C3=B6pel?= , Sumit Semwal , Kevin Tian , Ankit Agrawal , Pranjal Shrivastava , Alistair Popple , Vivek Kasireddy , , , , , Subject: [PATCH 7/9] vfio/pci: Support mmap() of a VFIO DMABUF Date: Thu, 16 Apr 2026 06:17:50 -0700 Message-ID: <20260416131815.2729131-8-mattev@meta.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260416131815.2729131-1-mattev@meta.com> References: <20260416131815.2729131-1-mattev@meta.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Authority-Analysis: v=2.4 cv=PJ8/P/qC c=1 sm=1 tr=0 ts=69e0e1ba cx=c_pps a=MfjaFnPeirRr97d5FC5oHw==:117 a=MfjaFnPeirRr97d5FC5oHw==:17 a=A5OVakUREuEA:10 a=VkNPw1HP01LnGYTKEx00:22 a=7x6HtfJdh03M6CCDgxCd:22 a=xtH7KyWI9dI7BmFOsl-x:22 a=VabnemYjAAAA:8 a=KDxd6JyZDO1E9H-QfAEA:9 a=gKebqoRLp9LExxC7YDUY:22 X-Proofpoint-GUID: iQnS-hDi1kQgsb64GoSQSz5lL8o7V5EU X-Proofpoint-ORIG-GUID: iQnS-hDi1kQgsb64GoSQSz5lL8o7V5EU X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwNDE2MDEyNyBTYWx0ZWRfX8/4gRa9wzHUf RjjzET/LDVWLNjfifzxIIn/Sbimr/o3ddfvj7x0uC4DH8n+rW0sS9v5E5L3CKBGizcyaw6NMzmQ l+PclLlDdc0aqBwpcTz9x4M19vYIxfZTl0eKJVzJYrOTanAQwh/TCT2LPDyQTO7bQ/JBW9MOlI2 ypbiNhtkGkVbvLr2c7+N/nHzVZNBgNa6YC3n1p3f6oCWeYNqmeJkSuS6rhkSA4s7PFNuzW32cCv paF7C+bCOACqGwDj/ni/e1IeLZhXUkFzaVpJie+CLCrI5pDkAzdXkUluOkDN4Uq27zdTulGLzTa 7n460f2aQbYYcRYHQ3HQc2+GSvvCV8BhJrsvC1g2tEk7OoMmLUed4cKHLqThqc32UE1B1/At/EU wG4h8SRF3jIn2+VPNgzoEiRTm+TqYSHWQ5VEgna2JMy2CjhvU+2qAM6Bb6jq+5fkV7wwKU/owfV fY7da/FVIgTdA//aWLA== X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1143,Hydra:6.1.51,FMLib:17.12.100.49 definitions=2026-04-16_03,2026-04-16_02,2025-10-01_01 Content-Type: text/plain; charset="utf-8" A VFIO DMABUF can export a subset of a BAR to userspace by fd; add support for mmap() of this fd. This provides another route for a process to map BARs, except one where the process can only map a specific subset of a BAR represented by the exported DMABUF. mmap() support enables userspace driver designs that safely delegate access to BAR sub-ranges to other client processes by sharing a DMABUF fd, without having to share the (omnipotent) VFIO device fd with them. Since the main VFIO BAR mmap() is now DMABUF-aware, this path reuses the existing vm_ops. But, since the lifecycle of an exported DMABUF is still decoupled from that of the device fd it came from, the device fd might now be closed concurrent with a VMA fault. Extra synchronisation is added to deal with the possibility of a fault racing with the DMABUF cleanup path. (Note that this differs to a DMABUF implicitly created on the mmap() path, which holds ownership of the device fd and so prevents close-during-fault scenarios in order to maintain the same user-facing behaviour on close.) It does this by temporarily taking a VFIO device registration to ensure vdev remains valid, then vdev->memory_lock can be taken. Signed-off-by: Matt Evans --- drivers/vfio/pci/vfio_pci_core.c | 79 ++++++++++++++++++++++++++---- drivers/vfio/pci/vfio_pci_dmabuf.c | 28 +++++++++++ drivers/vfio/pci/vfio_pci_priv.h | 2 + 3 files changed, 99 insertions(+), 10 deletions(-) diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_c= ore.c index 464b63585bef..cad126cf8737 100644 --- a/drivers/vfio/pci/vfio_pci_core.c +++ b/drivers/vfio/pci/vfio_pci_core.c @@ -12,6 +12,8 @@ =20 #include #include +#include +#include #include #include #include @@ -1703,20 +1705,76 @@ static vm_fault_t vfio_pci_mmap_huge_fault(struct v= m_fault *vmf, vm_fault_t ret =3D VM_FAULT_SIGBUS; =20 /* - * We can rely on the existence of both a DMABUF (priv) and - * the VFIO device it was exported from (vdev). This fault's - * VMA was established using vfio_pci_core_mmap_prep_dmabuf() - * which transfers ownership of the VFIO device fd to the - * DMABUF, and so the VFIO device is held open because the - * VMA's vm_file (DMABUF) is open. + * The only thing this can rely on is that the DMABUF relating + * to the VMA's vm_file exists (priv). * - * Since vfio_pci_dma_buf_cleanup() cannot have happened, - * vdev must be valid; we can take memory_lock. + * A DMABUF for a VFIO device fd mmap() holds a reference to + * the original VFIO device fd, but an explicitly-exported + * DMABUF does not. The original fd might have closed, + * meaning this fault can race with + * vfio_pci_dma_buf_cleanup(), meaning priv->vdev might be + * NULL, and the VFIO device registration might have been + * dropped. + * + * With the goal of taking vdev->memory_lock in a world where + * vdev might not still exist: + * + * 1. Take the resv lock on the DMABUF: + * - If racing cleanup got in first, vdev =3D=3D NULL and buffer + * is revoked; stop/exit if so. + * - If we got in first, vdev is non-NULL, accessible, and + * cleanup _has not yet put the VFIO device registration_, + * so the device refcount must be >0. + * + * 2. Take vfio_device registration (refcount guaranteed >0 + * hereafter). + * + * 3. Unlock the DMABUF's resv lock: + * - A racing cleanup can now complete. + * - But, the device refcount >0, meaning the vfio_device + * (and vfio_pcie_core device vdev) have not yet been + * freed. vdev is accessible, even if the DMABUF has been + * revoked or cleanup has happened, because + * vfio_unregister_group_dev() can't complete. + * + * 4. Take the vdev->memory_lock + * - Either the DMABUF is usable, or has been cleaned up. + * Whichever, it can no longer change under us. + * - Test the DMABUF revocation status again: if it was + * revoked between 1 and 4 return a SIGBUS. Otherwise, + * return a PFN. + * - It's not necessary to also take the resv lock, because + * the status/vdev can't change while memory_lock is held. + * + * 5. Unlock, done. */ + + dma_resv_lock(priv->dmabuf->resv, NULL); vdev =3D READ_ONCE(priv->vdev); =20 + if (READ_ONCE(priv->revoked) || !vdev) { + pr_debug_ratelimited("%s VA 0x%lx, pgoff 0x%lx: DMABUF revoked/cleaned u= p\n", + __func__, vmf->address, vma->vm_pgoff); + dma_resv_unlock(priv->dmabuf->resv); + return VM_FAULT_SIGBUS; + } + /* vdev is usable */ + + if (!vfio_device_try_get_registration(&vdev->vdev)) { + /* + * If vdev !=3D NULL (above), the registration should + * already be >0 and so this try_get should never + * fail. + */ + dev_warn(&vdev->pdev->dev, "%s: Unexpected registration failure\n", + __func__); + dma_resv_unlock(priv->dmabuf->resv); + return VM_FAULT_SIGBUS; + } + dma_resv_unlock(priv->dmabuf->resv); + scoped_guard(rwsem_read, &vdev->memory_lock) { - if (!priv->revoked) { + if (!READ_ONCE(priv->revoked)) { int pres =3D vfio_pci_dma_buf_find_pfn(priv, vma, vmf->address, order, &pfn); @@ -1734,6 +1792,7 @@ static vm_fault_t vfio_pci_mmap_huge_fault(struct vm_= fault *vmf, vma->vm_pgoff, (unsigned int)ret); } =20 + vfio_device_put_registration(&vdev->vdev); return ret; } =20 @@ -1742,7 +1801,7 @@ static vm_fault_t vfio_pci_mmap_page_fault(struct vm_= fault *vmf) return vfio_pci_mmap_huge_fault(vmf, 0); } =20 -static const struct vm_operations_struct vfio_pci_mmap_ops =3D { +const struct vm_operations_struct vfio_pci_mmap_ops =3D { .fault =3D vfio_pci_mmap_page_fault, #ifdef CONFIG_ARCH_SUPPORTS_HUGE_PFNMAP .huge_fault =3D vfio_pci_mmap_huge_fault, diff --git a/drivers/vfio/pci/vfio_pci_dmabuf.c b/drivers/vfio/pci/vfio_pci= _dmabuf.c index 04c7733fe712..cc477f46a7d5 100644 --- a/drivers/vfio/pci/vfio_pci_dmabuf.c +++ b/drivers/vfio/pci/vfio_pci_dmabuf.c @@ -27,6 +27,33 @@ static int vfio_pci_dma_buf_attach(struct dma_buf *dmabu= f, =20 return 0; } + +static int vfio_pci_dma_buf_mmap(struct dma_buf *dmabuf, struct vm_area_st= ruct *vma) +{ + struct vfio_pci_dma_buf *priv =3D dmabuf->priv; + u64 req_len, req_start; + + if (priv->revoked) + return -ENODEV; + if ((vma->vm_flags & VM_SHARED) =3D=3D 0) + return -EINVAL; + + req_len =3D vma->vm_end - vma->vm_start; + req_start =3D vma->vm_pgoff << PAGE_SHIFT; + if (req_start + req_len > priv->size) + return -EINVAL; + + vma->vm_page_prot =3D pgprot_noncached(vma->vm_page_prot); + vma->vm_page_prot =3D pgprot_decrypted(vma->vm_page_prot); + + /* See comments in vfio_pci_core_mmap() re VM_ALLOW_ANY_UNCACHED. */ + vm_flags_set(vma, VM_ALLOW_ANY_UNCACHED | VM_IO | VM_PFNMAP | + VM_DONTEXPAND | VM_DONTDUMP); + vma->vm_private_data =3D priv; + vma->vm_ops =3D &vfio_pci_mmap_ops; + + return 0; +} #endif /* CONFIG_VFIO_PCI_DMABUF */ =20 static void vfio_pci_dma_buf_done(struct kref *kref) @@ -94,6 +121,7 @@ static void vfio_pci_dma_buf_release(struct dma_buf *dma= buf) static const struct dma_buf_ops vfio_pci_dmabuf_ops =3D { #ifdef CONFIG_VFIO_PCI_DMABUF .attach =3D vfio_pci_dma_buf_attach, + .mmap =3D vfio_pci_dma_buf_mmap, #endif .map_dma_buf =3D vfio_pci_dma_buf_map, .unmap_dma_buf =3D vfio_pci_dma_buf_unmap, diff --git a/drivers/vfio/pci/vfio_pci_priv.h b/drivers/vfio/pci/vfio_pci_p= riv.h index a8edbee6ce56..f837d6c8bddc 100644 --- a/drivers/vfio/pci/vfio_pci_priv.h +++ b/drivers/vfio/pci/vfio_pci_priv.h @@ -37,6 +37,8 @@ struct vfio_pci_dma_buf { u8 revoked : 1; }; =20 +extern const struct vm_operations_struct vfio_pci_mmap_ops; + bool vfio_pci_intx_mask(struct vfio_pci_core_device *vdev); void vfio_pci_intx_unmask(struct vfio_pci_core_device *vdev); =20 --=20 2.47.3 From nobody Tue Jun 16 04:59:01 2026 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 90B613C5DDD; Thu, 16 Apr 2026 13:19:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=67.231.145.42 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776345549; cv=none; b=n6yFZ27jfePfhPi+gB6N4lLuT49Z6dTJqKTDfRmTpJ1p4Zgbuz8Ecm8iTmL8DlNMGliK8N3yVLvm14r8gjcbqg2c47KvYZfFo+fNk0I03P49Ba+CiHk0c0A2GSAmK/GlPg4V4ISfI2hf6LKTGx2i2UZSMvvmhPXmgyhiS8bkhTs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776345549; c=relaxed/simple; bh=FPcFx4FLZmXjsG+TRRDApVERsLIzaZw4whndiCZkRkk=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=K6VYM9f/q0O0FP+vidfZ6SF2BQRB+dPNiKENFjulg66IL11NXB20jQlMrMTZjbRaNSwluhy3t219YSKqOwoxvYuL1wTAuj8kcelzUrcmNg1/vacYQnIP5QyWcFonx83v86OlpYiSwkSMkGTNZ7JOetmUHUc7tL/Q6mtTWxRC3dQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com; spf=pass smtp.mailfrom=meta.com; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b=fJpOY5fp; arc=none smtp.client-ip=67.231.145.42 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=meta.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b="fJpOY5fp" Received: from pps.filterd (m0528008.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.18.1.11/8.18.1.11) with ESMTP id 63FMSs604138810; Thu, 16 Apr 2026 06:18:54 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=s2048-2025-q2; bh=SEN56UVYmy5rvuhAOFv6AeV2YQpGpOxPS75j442IO94=; b=fJpOY5fpkqby 52ddaA5Glwu2fMTzGD4tbq3QY4+i5o4ksKNRSg83D66dLokxIx7zHuYBUPcZTWL9 5eFkQyrRDTFYjgHNNkfx8cxJLf4WipoIvLfHngxNTXbyNVyoLLaaSYFy111l0QHX fZbZohlGFMNYcaCCmFvlxpR7HDlZfxfRitTEUfz8Ib/McLXPSyxOac7TcMfEIKH8 ttb9rS2am70d2Bps4+eMwDHUP2Gc2PFTxIRMQkZnDStuYVDZOEjBeNUmLQT5itUO yGTKWwyxSVYf8Uac1fmldYE9xnxhl++SjFV0AzWJZGM9YY0oXzlAPQh1TK9WV7zC 7LyyeVLYGw== Received: from maileast.thefacebook.com ([163.114.135.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 4dh84w1rnh-2 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT); Thu, 16 Apr 2026 06:18:53 -0700 (PDT) Received: from localhost (2620:10d:c0a8:fe::f072) by mail.thefacebook.com (2620:10d:c0a9:6f::237c) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.2562.37; Thu, 16 Apr 2026 13:18:52 +0000 From: Matt Evans To: Alex Williamson , Leon Romanovsky , Jason Gunthorpe , Alex Mastro , =?UTF-8?q?Christian=20K=C3=B6nig?= CC: Mahmoud Adam , David Matlack , =?UTF-8?q?Bj=C3=B6rn=20T=C3=B6pel?= , Sumit Semwal , Kevin Tian , Ankit Agrawal , Pranjal Shrivastava , Alistair Popple , Vivek Kasireddy , , , , , Subject: [PATCH 8/9] vfio/pci: Permanently revoke a DMABUF on request Date: Thu, 16 Apr 2026 06:17:51 -0700 Message-ID: <20260416131815.2729131-9-mattev@meta.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260416131815.2729131-1-mattev@meta.com> References: <20260416131815.2729131-1-mattev@meta.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Proofpoint-ORIG-GUID: Ozw6H9Amcb7foMieE6xO4OrJT8cNdKRh X-Authority-Analysis: v=2.4 cv=OYioyBTY c=1 sm=1 tr=0 ts=69e0e1bd cx=c_pps a=MfjaFnPeirRr97d5FC5oHw==:117 a=MfjaFnPeirRr97d5FC5oHw==:17 a=A5OVakUREuEA:10 a=VkNPw1HP01LnGYTKEx00:22 a=7x6HtfJdh03M6CCDgxCd:22 a=_1IyUuN4QrATX339ibzo:22 a=VabnemYjAAAA:8 a=JgBNHPj_VUcJpbbFxpoA:9 a=gKebqoRLp9LExxC7YDUY:22 X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwNDE2MDEyNiBTYWx0ZWRfX2vBVezmgynAl 4Jeypb3sgty5/5OH9xcz+plKJsTMvLft4uSz+qx8cBuOQTCmFPGtDLkmtIv1uFe+P1CM1uBHbTc wIpC3KlkTA1qzW8qXD5FFInpwf5mg0Q6gq+psFMS7i8E450z+sDsZMG6PrIYuSiJPlv1XNnlQDg 566NK8hZqp/gd3UsIItCiA0YJZ47f6lS0oPjKf2+xWHXlxYE4OWiNZPgLPI/nHS1/ngLAvTzv5Y m/9soJAKbuzl6cCH5Oh27hBth/L5gYyaPwDZODkXrL8jPue+DEr9ecKTDAQZEkJa+DupP3s59nw jQJxVQk+WFJ1WJ3IHUVVkWwsqKxGikk25hCqFZDCQEhnH/9Vhw3HDr8U7YnWSjk0eBIeeCLsQxB 6Dea+guPLwNyxHWXV4pAHfiisgKrx6VYQiKhz2nZ2+ev/9zDTxeNIvJV0VMiy+LHKDXTJ7yXsud dqzrU+U4hP8BMqz+BuQ== X-Proofpoint-GUID: Ozw6H9Amcb7foMieE6xO4OrJT8cNdKRh X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1143,Hydra:6.1.51,FMLib:17.12.100.49 definitions=2026-04-16_03,2026-04-16_02,2025-10-01_01 Content-Type: text/plain; charset="utf-8" Expand the VFIO DMABUF revocation state to three states: Not revoked, temporarily revoked, and permanently revoked. The first two are for existing transient revocation, e.g. across a function reset, and the DMABUF is put into the last in response to a new ioctl(VFIO_DEVICE_PCI_DMABUF_REVOKE) request. This VFIO device fd ioctl passes a DMABUF by fd and requests that the DMABUF is permanently revoked. On success, it's guaranteed that the buffer can never be imported/attached/mmap()ed in future, that dynamic imports have been cleanly detached, and all mappings made inaccessible/PTEs zapped. This is useful for lifecycle management, to reclaim VFIO PCI BAR ranges previously delegated to a subordinate client process: The driver process can ensure that the loaned resources are revoked when the client is deemed "done", and exported ranges can be safely re-used elsewhere. Signed-off-by: Matt Evans --- drivers/vfio/pci/vfio_pci_core.c | 21 +++- drivers/vfio/pci/vfio_pci_dmabuf.c | 158 +++++++++++++++++++++-------- drivers/vfio/pci/vfio_pci_priv.h | 14 ++- include/uapi/linux/vfio.h | 30 ++++++ 4 files changed, 179 insertions(+), 44 deletions(-) diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_c= ore.c index cad126cf8737..59582fcfba97 100644 --- a/drivers/vfio/pci/vfio_pci_core.c +++ b/drivers/vfio/pci/vfio_pci_core.c @@ -1461,6 +1461,21 @@ static int vfio_pci_ioctl_ioeventfd(struct vfio_pci_= core_device *vdev, ioeventfd.fd); } =20 +static int vfio_pci_ioctl_dmabuf_revoke(struct vfio_pci_core_device *vdev, + struct vfio_pci_dmabuf_revoke __user *arg) +{ + unsigned long minsz =3D offsetofend(struct vfio_pci_dmabuf_revoke, dmabuf= _fd); + struct vfio_pci_dmabuf_revoke revoke; + + if (copy_from_user(&revoke, arg, minsz)) + return -EFAULT; + + if (revoke.argsz < minsz) + return -EINVAL; + + return vfio_pci_dma_buf_revoke(vdev, revoke.dmabuf_fd); +} + long vfio_pci_core_ioctl(struct vfio_device *core_vdev, unsigned int cmd, unsigned long arg) { @@ -1483,6 +1498,8 @@ long vfio_pci_core_ioctl(struct vfio_device *core_vde= v, unsigned int cmd, return vfio_pci_ioctl_reset(vdev, uarg); case VFIO_DEVICE_SET_IRQS: return vfio_pci_ioctl_set_irqs(vdev, uarg); + case VFIO_DEVICE_PCI_DMABUF_REVOKE: + return vfio_pci_ioctl_dmabuf_revoke(vdev, uarg); default: return -ENOTTY; } @@ -1752,7 +1769,7 @@ static vm_fault_t vfio_pci_mmap_huge_fault(struct vm_= fault *vmf, dma_resv_lock(priv->dmabuf->resv, NULL); vdev =3D READ_ONCE(priv->vdev); =20 - if (READ_ONCE(priv->revoked) || !vdev) { + if (READ_ONCE(priv->status) !=3D VFIO_PCI_DMABUF_OK || !vdev) { pr_debug_ratelimited("%s VA 0x%lx, pgoff 0x%lx: DMABUF revoked/cleaned u= p\n", __func__, vmf->address, vma->vm_pgoff); dma_resv_unlock(priv->dmabuf->resv); @@ -1774,7 +1791,7 @@ static vm_fault_t vfio_pci_mmap_huge_fault(struct vm_= fault *vmf, dma_resv_unlock(priv->dmabuf->resv); =20 scoped_guard(rwsem_read, &vdev->memory_lock) { - if (!READ_ONCE(priv->revoked)) { + if (READ_ONCE(priv->status) =3D=3D VFIO_PCI_DMABUF_OK) { int pres =3D vfio_pci_dma_buf_find_pfn(priv, vma, vmf->address, order, &pfn); diff --git a/drivers/vfio/pci/vfio_pci_dmabuf.c b/drivers/vfio/pci/vfio_pci= _dmabuf.c index cc477f46a7d5..48ec4da2db8b 100644 --- a/drivers/vfio/pci/vfio_pci_dmabuf.c +++ b/drivers/vfio/pci/vfio_pci_dmabuf.c @@ -19,7 +19,7 @@ static int vfio_pci_dma_buf_attach(struct dma_buf *dmabuf, if (!attachment->peer2peer) return -EOPNOTSUPP; =20 - if (priv->revoked) + if (priv->status !=3D VFIO_PCI_DMABUF_OK) return -ENODEV; =20 if (!dma_buf_attach_revocable(attachment)) @@ -33,7 +33,7 @@ static int vfio_pci_dma_buf_mmap(struct dma_buf *dmabuf, = struct vm_area_struct * struct vfio_pci_dma_buf *priv =3D dmabuf->priv; u64 req_len, req_start; =20 - if (priv->revoked) + if (priv->status !=3D VFIO_PCI_DMABUF_OK) return -ENODEV; if ((vma->vm_flags & VM_SHARED) =3D=3D 0) return -EINVAL; @@ -73,7 +73,7 @@ vfio_pci_dma_buf_map(struct dma_buf_attachment *attachmen= t, =20 dma_resv_assert_held(priv->dmabuf->resv); =20 - if (priv->revoked) + if (priv->status !=3D VFIO_PCI_DMABUF_OK) return ERR_PTR(-ENODEV); =20 ret =3D dma_buf_phys_vec_to_sgt(attachment, priv->provider, @@ -270,7 +270,8 @@ static int vfio_pci_dmabuf_export(struct vfio_pci_core_= device *vdev, INIT_LIST_HEAD(&priv->dmabufs_elm); down_write(&vdev->memory_lock); dma_resv_lock(priv->dmabuf->resv, NULL); - priv->revoked =3D !__vfio_pci_memory_enabled(vdev); + priv->status =3D __vfio_pci_memory_enabled(vdev) ? VFIO_PCI_DMABUF_OK : + VFIO_PCI_DMABUF_TEMP_REVOKED; list_add_tail(&priv->dmabufs_elm, &vdev->dmabufs); dma_resv_unlock(priv->dmabuf->resv); up_write(&vdev->memory_lock); @@ -301,7 +302,7 @@ int vfio_pci_dma_buf_iommufd_map(struct dma_buf_attachm= ent *attachment, return -EOPNOTSUPP; =20 priv =3D attachment->dmabuf->priv; - if (priv->revoked) + if (priv->status !=3D VFIO_PCI_DMABUF_OK) return -ENODEV; =20 /* More than one range to iommufd will require proper DMABUF support */ @@ -581,6 +582,64 @@ int vfio_pci_core_mmap_prep_dmabuf(struct vfio_pci_cor= e_device *vdev, return ret; } =20 +static void __vfio_pci_dma_buf_revoke(struct vfio_pci_dma_buf *priv, bool = revoked, + bool permanently) +{ + bool was_revoked; + + lockdep_assert_held_write(&priv->vdev->memory_lock); + + if ((priv->status =3D=3D VFIO_PCI_DMABUF_PERM_REVOKED) || + (priv->status =3D=3D VFIO_PCI_DMABUF_OK && !revoked) || + (priv->status =3D=3D VFIO_PCI_DMABUF_TEMP_REVOKED && revoked && !perm= anently)) { + return; + } + + dma_resv_lock(priv->dmabuf->resv, NULL); + was_revoked =3D priv->status !=3D VFIO_PCI_DMABUF_OK; + + if (revoked) + priv->status =3D permanently ? + VFIO_PCI_DMABUF_PERM_REVOKED : VFIO_PCI_DMABUF_TEMP_REVOKED; + + /* + * If TEMP_REVOKED is being upgraded to PERM_REVOKED, the + * buffer is already gone. Don't wait on it again. + */ + if (was_revoked && revoked) { + dma_resv_unlock(priv->dmabuf->resv); + return; + } + + dma_buf_invalidate_mappings(priv->dmabuf); + dma_resv_wait_timeout(priv->dmabuf->resv, + DMA_RESV_USAGE_BOOKKEEP, false, + MAX_SCHEDULE_TIMEOUT); + dma_resv_unlock(priv->dmabuf->resv); + if (revoked) { + kref_put(&priv->kref, vfio_pci_dma_buf_done); + wait_for_completion(&priv->comp); + unmap_mapping_range(priv->dmabuf->file->f_mapping, + 0, priv->size, 1); + } else { + /* + * Kref is initialize again, because when revoke + * was performed the reference counter was decreased + * to zero to trigger completion. + */ + kref_init(&priv->kref); + /* + * There is no need to wait as no mapping was + * performed when the previous status was + * priv->status =3D=3D *REVOKED. + */ + reinit_completion(&priv->comp); + dma_resv_lock(priv->dmabuf->resv, NULL); + priv->status =3D VFIO_PCI_DMABUF_OK; + dma_resv_unlock(priv->dmabuf->resv); + } +} + void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev, bool revoked) { struct vfio_pci_dma_buf *priv; @@ -589,45 +648,13 @@ void vfio_pci_dma_buf_move(struct vfio_pci_core_devic= e *vdev, bool revoked) lockdep_assert_held_write(&vdev->memory_lock); /* * Holding memory_lock ensures a racing VMA fault observes - * priv->revoked properly. + * priv->status properly. */ =20 list_for_each_entry_safe(priv, tmp, &vdev->dmabufs, dmabufs_elm) { if (!get_file_active(&priv->dmabuf->file)) continue; - - if (priv->revoked !=3D revoked) { - dma_resv_lock(priv->dmabuf->resv, NULL); - if (revoked) - priv->revoked =3D true; - dma_buf_invalidate_mappings(priv->dmabuf); - dma_resv_wait_timeout(priv->dmabuf->resv, - DMA_RESV_USAGE_BOOKKEEP, false, - MAX_SCHEDULE_TIMEOUT); - dma_resv_unlock(priv->dmabuf->resv); - if (revoked) { - kref_put(&priv->kref, vfio_pci_dma_buf_done); - wait_for_completion(&priv->comp); - unmap_mapping_range(priv->dmabuf->file->f_mapping, - 0, priv->size, 1); - } else { - /* - * Kref is initialize again, because when revoke - * was performed the reference counter was decreased - * to zero to trigger completion. - */ - kref_init(&priv->kref); - /* - * There is no need to wait as no mapping was - * performed when the previous status was - * priv->revoked =3D=3D true. - */ - reinit_completion(&priv->comp); - dma_resv_lock(priv->dmabuf->resv, NULL); - priv->revoked =3D false; - dma_resv_unlock(priv->dmabuf->resv); - } - } + __vfio_pci_dma_buf_revoke(priv, revoked, false); fput(priv->dmabuf->file); } } @@ -647,8 +674,8 @@ void vfio_pci_dma_buf_cleanup(struct vfio_pci_core_devi= ce *vdev) dma_resv_lock(priv->dmabuf->resv, NULL); list_del_init(&priv->dmabufs_elm); priv->vdev =3D NULL; - was_revoked =3D priv->revoked; - priv->revoked =3D true; + was_revoked =3D (priv->status !=3D VFIO_PCI_DMABUF_OK); + priv->status =3D VFIO_PCI_DMABUF_PERM_REVOKED; dma_buf_invalidate_mappings(priv->dmabuf); dma_resv_wait_timeout(priv->dmabuf->resv, DMA_RESV_USAGE_BOOKKEEP, false, @@ -665,3 +692,52 @@ void vfio_pci_dma_buf_cleanup(struct vfio_pci_core_dev= ice *vdev) } up_write(&vdev->memory_lock); } + +#ifdef CONFIG_VFIO_PCI_DMABUF +int vfio_pci_dma_buf_revoke(struct vfio_pci_core_device *vdev, int dmabuf_= fd) +{ + struct dma_buf *dmabuf; + struct vfio_pci_dma_buf *priv; + int ret =3D 0; + + dmabuf =3D dma_buf_get(dmabuf_fd); + if (IS_ERR(dmabuf)) + return PTR_ERR(dmabuf); + + /* + * Sanity-check the DMABUF is really a vfio_pci_dma_buf _and_ + * (below) relates to the VFIO device it was provided with: + */ + if (dmabuf->ops !=3D &vfio_pci_dmabuf_ops) { + ret =3D -ENODEV; + goto out_put_buf; + } + + priv =3D dmabuf->priv; + + scoped_guard(rwsem_write, &vdev->memory_lock) { + struct vfio_pci_core_device *db_vdev =3D READ_ONCE(priv->vdev); + + /* + * Reading priv->vdev inside the lock is conservative, + * because cleanup (changes vdev) is (today) prevented + * from running concurrently by the VFIO device fd + * being held open by the caller, ioctl. + */ + if (!db_vdev || db_vdev !=3D vdev) { + ret =3D -ENODEV; + break; + } + + if (priv->status =3D=3D VFIO_PCI_DMABUF_PERM_REVOKED) + ret =3D -EBADFD; + else + __vfio_pci_dma_buf_revoke(priv, true, true); + } + + out_put_buf: + dma_buf_put(dmabuf); + + return ret; +} +#endif /* CONFIG_VFIO_PCI_DMABUF */ diff --git a/drivers/vfio/pci/vfio_pci_priv.h b/drivers/vfio/pci/vfio_pci_p= riv.h index f837d6c8bddc..eac5606ca161 100644 --- a/drivers/vfio/pci/vfio_pci_priv.h +++ b/drivers/vfio/pci/vfio_pci_priv.h @@ -23,6 +23,12 @@ struct vfio_pci_ioeventfd { bool test_mem; }; =20 +enum vfio_pci_dma_buf_status { + VFIO_PCI_DMABUF_OK =3D 0, + VFIO_PCI_DMABUF_TEMP_REVOKED =3D 1, + VFIO_PCI_DMABUF_PERM_REVOKED =3D 2, +}; + struct vfio_pci_dma_buf { struct dma_buf *dmabuf; struct vfio_pci_core_device *vdev; @@ -34,7 +40,7 @@ struct vfio_pci_dma_buf { u32 nr_ranges; struct kref kref; struct completion comp; - u8 revoked : 1; + enum vfio_pci_dma_buf_status status; }; =20 extern const struct vm_operations_struct vfio_pci_mmap_ops; @@ -147,6 +153,7 @@ void vfio_pci_dma_buf_move(struct vfio_pci_core_device = *vdev, bool revoked); int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 f= lags, struct vfio_device_feature_dma_buf __user *arg, size_t argsz); +int vfio_pci_dma_buf_revoke(struct vfio_pci_core_device *vdev, int dmabuf_= fd); #else static inline int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags, @@ -155,6 +162,11 @@ vfio_pci_core_feature_dma_buf(struct vfio_pci_core_dev= ice *vdev, u32 flags, { return -ENOTTY; } +static inline int vfio_pci_dma_buf_revoke(struct vfio_pci_core_device *vde= v, + int dmabuf_fd) +{ + return -ENODEV; +} #endif =20 #endif diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h index 5de618a3a5ee..77225ed8115f 100644 --- a/include/uapi/linux/vfio.h +++ b/include/uapi/linux/vfio.h @@ -1321,6 +1321,36 @@ struct vfio_precopy_info { =20 #define VFIO_MIG_GET_PRECOPY_INFO _IO(VFIO_TYPE, VFIO_BASE + 21) =20 +/** + * VFIO_DEVICE_PCI_DMABUF_REVOKE - _IO(VFIO_TYPE, VFIO_BASE + 22) + * + * This ioctl is used on the device FD, and requests that access to + * the buffer corresponding to the DMABUF FD parameter is immediately + * and permanently revoked. On successful return, the buffer is not + * accessible through any mmap() or dma-buf import. The request fails + * if the buffer is pinned; otherwise, the exporter marks the buffer + * as inaccessible and uses the move_notify callback to inform + * importers of the change. The buffer is permanently disabled, and + * VFIO refuses all map, mmap, attach, etc. requests. + * + * Returns: + * + * Return: 0 on success, -1 and errno set on failure: + * + * ENODEV if the associated dmabuf FD no longer exists/is closed, + * or is not a DMABUF created for this device. + * EINVAL if the dmabuf_fd parameter isn't a DMABUF. + * EBADF if the dmabuf_fd parameter isn't a valid file number. + * EBADFD if the buffer has already been revoked. + * + */ +struct vfio_pci_dmabuf_revoke { + __u32 argsz; + __u32 dmabuf_fd; +}; + +#define VFIO_DEVICE_PCI_DMABUF_REVOKE _IO(VFIO_TYPE, VFIO_BASE + 22) + /* * Upon VFIO_DEVICE_FEATURE_SET, allow the device to be moved into a low p= ower * state with the platform-based power management. Device use of lower po= wer --=20 2.47.3 From nobody Tue Jun 16 04:59:01 2026 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7D8883CD8D5; Thu, 16 Apr 2026 13:19:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=67.231.153.30 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776345551; cv=none; b=KAYBZ0oT/uqtlAwwJ3uv083UVjBcFsLJiWAuFsK/5YOa1D/jDfjwwIJAsUmEQt6jjcb51/Thkwixk1Cz1LhhccgaNbwkvWVGSbxWyfWIBUr4YA4DYXGIWKFSYryiH067aKWmoktruB4WBCpCjiIT2dZz5HXAETAmOYETtMIuuLk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776345551; c=relaxed/simple; bh=14h1F1VrifsxxJ/TG9PRx84X3spk5+U2yiG+c1NsLTM=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=W4VpJLzfniwX9Cz8tl9LBiGtxhfJfqGqRwXAVdD9CUokXZ5o9CeIys8g4wg2xmr4t/2tlzc80b2UHHscpnnpJFv1upvPBJ3JoukEZL1cQAxNjt0Y8NRJrEXKGmtuy7icBzF3Jzyx+8Nps+jq8sU16DDh2vXmAtVfAvlS+PGgRgU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com; spf=pass smtp.mailfrom=meta.com; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b=fgKuowTf; arc=none smtp.client-ip=67.231.153.30 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=meta.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b="fgKuowTf" Received: from pps.filterd (m0528004.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.18.1.11/8.18.1.11) with ESMTP id 63GDClxn292577; Thu, 16 Apr 2026 06:18:55 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=s2048-2025-q2; bh=OnXpacRUCWVmj2ZhEhbENFViXZJLK6k7he2hDqU0XWA=; b=fgKuowTfbhMc LpD6qO8c8yeBGBeIuAN712xR/ZRGjstUM7BrwY/5k7kWAfKarMPtnsx5R52+WMfk 69BEruQpJbj/30yDaTt0XB52tbIGNLgz7CNAvsZkMeNh8+qryBSDCKYNLapa76VW 2FDAb/s9iwN+60V98bPHo5x9nQVMIj1W++67kQ4NNIs8qUWykeIn+qJfd9GD2pxg 0LLf32QaC3zYOrjwNqvXO3RNH0Y0bfLRgtMoeNQuySGS+mYnaRN/sDGM786SJAxu dfvouwvFsogFBqgu7bpMxuFYt8V95Nu3HbXJIbUxUIg21YsqP0he7VP/0pTPv+8J cxXt1OafpQ== Received: from maileast.thefacebook.com ([163.114.135.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 4dh84tt06k-2 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT); Thu, 16 Apr 2026 06:18:55 -0700 (PDT) Received: from localhost (2620:10d:c0a8:fe::f072) by mail.thefacebook.com (2620:10d:c0a9:6f::8fd4) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.2562.37; Thu, 16 Apr 2026 13:18:54 +0000 From: Matt Evans To: Alex Williamson , Leon Romanovsky , Jason Gunthorpe , Alex Mastro , =?UTF-8?q?Christian=20K=C3=B6nig?= CC: Mahmoud Adam , David Matlack , =?UTF-8?q?Bj=C3=B6rn=20T=C3=B6pel?= , Sumit Semwal , Kevin Tian , Ankit Agrawal , Pranjal Shrivastava , Alistair Popple , Vivek Kasireddy , , , , , Subject: [PATCH 9/9] vfio/pci: Add mmap() attributes to DMABUF feature Date: Thu, 16 Apr 2026 06:17:52 -0700 Message-ID: <20260416131815.2729131-10-mattev@meta.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260416131815.2729131-1-mattev@meta.com> References: <20260416131815.2729131-1-mattev@meta.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Authority-Analysis: v=2.4 cv=DeknbPtW c=1 sm=1 tr=0 ts=69e0e1bf cx=c_pps a=MfjaFnPeirRr97d5FC5oHw==:117 a=MfjaFnPeirRr97d5FC5oHw==:17 a=A5OVakUREuEA:10 a=VkNPw1HP01LnGYTKEx00:22 a=7x6HtfJdh03M6CCDgxCd:22 a=GbPsI2Ihf5RTnMjR_gZv:22 a=VabnemYjAAAA:8 a=ZZip-zt6qTTd7FoaoeoA:9 a=gKebqoRLp9LExxC7YDUY:22 X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwNDE2MDEyNiBTYWx0ZWRfX+tFyEUII+htx zqwYV0dvl9D+Yw505382/FK3ILPMyAC418PRnZqUDx+oll9uXnun/Z+BrsRoSzzs2WlaYS2W9dh gt2K/IopaUAFYYMziA/d9FPa/C9LQH5wdyC0sSi3ntjiA4/s8UbH9ERH7uM2Z9bWNkPZgFbQvI0 Kxb4/vjYaXCqT4Qjs0npsBl8qcX/6PFfOdd/xZYLSqfx/2Dmzs3VR2hTBgWd3WEF4oE6swai/Bi 8MsxvROtCG+xdthlcWrgCmCWhQZSdHVRyj+pv3d9KJyFJ/0U8N1DBSht0fL8TJc51Y4b29jrb8a F1thlt0Jz/aIP8oEaiHLYBfPcBJYCE7ov0bI6WZ8NQ5ImrzE3dwRhUjt0qpD2pjStgYI00w5Slv JSX088HrUF78v9dWd/U7btiZexNZHznyZF/KvPghTxYTWW2/3klJC9VdG34NbNq488M5weruVde mtD87JYknNYlzwyZOxw== X-Proofpoint-GUID: G5a8D1fswKo3D1yySENReTX4evzXtV9I X-Proofpoint-ORIG-GUID: G5a8D1fswKo3D1yySENReTX4evzXtV9I X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1143,Hydra:6.1.51,FMLib:17.12.100.49 definitions=2026-04-16_03,2026-04-16_02,2025-10-01_01 Content-Type: text/plain; charset="utf-8" A new field is reserved in vfio_device_feature_dma_buf.flags to request CPU-facing memory type attributes for mmap()s of the buffer. Add a flag VFIO_DEVICE_FEATURE_DMA_BUF_ATTR_WC, which results in WC PTEs for the DMABUF's BAR region. Signed-off-by: Matt Evans Reviewed-by: Jason Gunthorpe --- drivers/vfio/pci/vfio_pci_dmabuf.c | 15 +++++++++++++-- drivers/vfio/pci/vfio_pci_priv.h | 1 + include/uapi/linux/vfio.h | 12 +++++++++--- 3 files changed, 23 insertions(+), 5 deletions(-) diff --git a/drivers/vfio/pci/vfio_pci_dmabuf.c b/drivers/vfio/pci/vfio_pci= _dmabuf.c index 48ec4da2db8b..00cedfe3a57d 100644 --- a/drivers/vfio/pci/vfio_pci_dmabuf.c +++ b/drivers/vfio/pci/vfio_pci_dmabuf.c @@ -43,7 +43,10 @@ static int vfio_pci_dma_buf_mmap(struct dma_buf *dmabuf,= struct vm_area_struct * if (req_start + req_len > priv->size) return -EINVAL; =20 - vma->vm_page_prot =3D pgprot_noncached(vma->vm_page_prot); + if (priv->attrs =3D=3D VFIO_DEVICE_FEATURE_DMA_BUF_ATTR_WC) + vma->vm_page_prot =3D pgprot_writecombine(vma->vm_page_prot); + else + vma->vm_page_prot =3D pgprot_noncached(vma->vm_page_prot); vma->vm_page_prot =3D pgprot_decrypted(vma->vm_page_prot); =20 /* See comments in vfio_pci_core_mmap() re VM_ALLOW_ANY_UNCACHED. */ @@ -370,6 +373,12 @@ static int validate_dmabuf_input(struct vfio_device_fe= ature_dma_buf *dma_buf, size_t length =3D 0; u32 i; =20 + if ((dma_buf->flags !=3D 0) && + ((dma_buf->flags & ~VFIO_DEVICE_FEATURE_DMA_BUF_ATTR_MASK) || + ((dma_buf->flags & VFIO_DEVICE_FEATURE_DMA_BUF_ATTR_MASK) !=3D + VFIO_DEVICE_FEATURE_DMA_BUF_ATTR_WC))) + return -EINVAL; + for (i =3D 0; i < dma_buf->nr_ranges; i++) { u64 offset =3D dma_ranges[i].offset; u64 len =3D dma_ranges[i].length; @@ -413,7 +422,7 @@ int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_= device *vdev, u32 flags, if (copy_from_user(&get_dma_buf, arg, sizeof(get_dma_buf))) return -EFAULT; =20 - if (!get_dma_buf.nr_ranges || get_dma_buf.flags) + if (!get_dma_buf.nr_ranges) return -EINVAL; =20 /* @@ -457,6 +466,7 @@ int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_= device *vdev, u32 flags, priv->vdev =3D vdev; priv->nr_ranges =3D get_dma_buf.nr_ranges; priv->size =3D length; + priv->attrs =3D get_dma_buf.flags & VFIO_DEVICE_FEATURE_DMA_BUF_ATTR_MASK; ret =3D vdev->pci_ops->get_dmabuf_phys(vdev, &priv->provider, get_dma_buf.region_index, priv->phys_vec, dma_ranges, @@ -542,6 +552,7 @@ int vfio_pci_core_mmap_prep_dmabuf(struct vfio_pci_core= _device *vdev, */ priv->vdev =3D vdev; priv->nr_ranges =3D nr_ranges; + priv->attrs =3D 0; priv->size =3D (vma->vm_pgoff << PAGE_SHIFT) + req_len; priv->provider =3D pcim_p2pdma_provider(vdev->pdev, res_index); if (!priv->provider) { diff --git a/drivers/vfio/pci/vfio_pci_priv.h b/drivers/vfio/pci/vfio_pci_p= riv.h index eac5606ca161..aeffd9f7f3b5 100644 --- a/drivers/vfio/pci/vfio_pci_priv.h +++ b/drivers/vfio/pci/vfio_pci_priv.h @@ -40,6 +40,7 @@ struct vfio_pci_dma_buf { u32 nr_ranges; struct kref kref; struct completion comp; + u32 attrs; enum vfio_pci_dma_buf_status status; }; =20 diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h index 77225ed8115f..93eef95dc7f3 100644 --- a/include/uapi/linux/vfio.h +++ b/include/uapi/linux/vfio.h @@ -1535,7 +1535,9 @@ struct vfio_device_feature_bus_master { * etc. offset/length specify a slice of the region to create the dmabuf f= rom. * nr_ranges is the total number of (P2P DMA) ranges that comprise the dma= buf. * - * flags should be 0. + * flags contains: + * - A field for userspace mapping attribute: by default, suitable for reg= ular + * MMIO. Alternate attributes (such as WC) can be selected. * * Return: The fd number on success, -1 and errno is set on failure. */ @@ -1549,8 +1551,12 @@ struct vfio_region_dma_range { struct vfio_device_feature_dma_buf { __u32 region_index; __u32 open_flags; - __u32 flags; - __u32 nr_ranges; + __u32 flags; + /* Flags sub-field reserved for attribute enum */ +#define VFIO_DEVICE_FEATURE_DMA_BUF_ATTR_MASK (0xfU << 28) +#define VFIO_DEVICE_FEATURE_DMA_BUF_ATTR_UC (0 << 28) +#define VFIO_DEVICE_FEATURE_DMA_BUF_ATTR_WC (1 << 28) + __u32 nr_ranges; struct vfio_region_dma_range dma_ranges[] __counted_by(nr_ranges); }; =20 --=20 2.47.3