From nobody Tue Apr 7 14:36:49 2026 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5FA8A3FFAB7; Thu, 12 Mar 2026 18:46:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=67.231.145.42 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773341206; cv=none; b=AS3+lmUyBOWhnPWa1t3HsRy++ZUAhTjy9+NNA/ghCfbCt8/PPetC8HidINB3IFmdxh5lvkVY0Ts9ayOqm5sAybHeiW72Rh1QCUuo60neQbZOwIBhXkOZIS/Tt4haJnmmiUgK9SLQHy2tsvLb57XyNbMMgW8rVlUZrj05zm/rOrQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773341206; c=relaxed/simple; bh=ZROWNSa59nUsl5X/QgDcGHcaR/c4lJYMdLrJTNBrpdk=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=BJvLdlI42nQlwZ36Y/fnW1yJhJMAGsfr+IohZlfzo0947erLnh6a6JAQCRPUG/BIjg3tMA1sT+E93CuiRWmGqawJAKLypTG+KeCyh0ymBOIeqb2gi8lX9nF93IqVN8y/2CBPqV9kSOmddez7N98SfhRuonA1AIhZMAmbbVskgnM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com; spf=pass smtp.mailfrom=meta.com; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b=pYL0HTRz; arc=none smtp.client-ip=67.231.145.42 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=meta.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b="pYL0HTRz" Received: from pps.filterd (m0109334.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.18.1.11/8.18.1.11) with ESMTP id 62CH37c93822056; Thu, 12 Mar 2026 11:46:30 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=s2048-2025-q2; bh=lyheLk9Wvs0fvD4s1b5iebQGWNGSW8pkLBga2jwoIMs=; b=pYL0HTRziDvE ieE6PsbkKCshsCYWNDD1wtNcjCEhoUDkzVT5WnfN5H+Urq+PoLq9kJ7m21Tcsulc CE41Dz6/6BW5PB+oJZhpw54Bp8AYSxCxYwDjjfR17KL8STWRYwR/1cX/HuWqN1Ed wQytqiM++CkCBXQhbarMSlzz7PpJjfAYm/76oCYJ1+2gX66Cu0q3/qMrBvGPFfAz AhpWlckCOXKsEdfaCp+iMNzxlaGCwY9LWFOSRjMVgu0pX7ZAxpgKAgpKoRHv1MQy h1s+sdnzTvGdWJZEXnR2lXiCehzAqZD/OaGTQsCD3RH7bXE+lGN/6DZcXxZXKQ58 FiiQxXP11Q== Received: from mail.thefacebook.com ([163.114.134.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 4cv1j2j8et-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT); Thu, 12 Mar 2026 11:46:30 -0700 (PDT) Received: from localhost (2620:10d:c085:208::7cb7) by mail.thefacebook.com (2620:10d:c08b:78::c78f) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.2562.35; Thu, 12 Mar 2026 18:46:29 +0000 From: Matt Evans To: Alex Williamson , Leon Romanovsky , Jason Gunthorpe , Alex Mastro , Mahmoud Adam , David Matlack CC: =?UTF-8?q?Bj=C3=B6rn=20T=C3=B6pel?= , Sumit Semwal , =?UTF-8?q?Christian=20K=C3=B6nig?= , Kevin Tian , Ankit Agrawal , Pranjal Shrivastava , Alistair Popple , Vivek Kasireddy , , , , , Subject: [RFC v2 PATCH 01/10] vfio/pci: Set up VFIO barmap before creating a DMABUF Date: Thu, 12 Mar 2026 11:45:59 -0700 Message-ID: <20260312184613.3710705-2-mattev@meta.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260312184613.3710705-1-mattev@meta.com> References: <20260312184613.3710705-1-mattev@meta.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Proofpoint-GUID: PvXIK7001yah_q_sVCLHPSoZDp3mLo2e X-Authority-Analysis: v=2.4 cv=ALq93nRn c=1 sm=1 tr=0 ts=69b30a06 cx=c_pps a=CB4LiSf2rd0gKozIdrpkBw==:117 a=CB4LiSf2rd0gKozIdrpkBw==:17 a=Yq5XynenixoA:10 a=VkNPw1HP01LnGYTKEx00:22 a=7x6HtfJdh03M6CCDgxCd:22 a=crHB47gyY4rKiduisYu9:22 a=VabnemYjAAAA:8 a=DkskzI1_-XwU4soQF-MA:9 a=gKebqoRLp9LExxC7YDUY:22 X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwMzEyMDE1MSBTYWx0ZWRfX8djASQHH+VAA ltjJlqmH55EDbWfxbbJ7FyTuvxqN6vMHePt/cF2tSVoxk6bkjkTwC8QjvpTwsT+RD9YMljBSaK/ 36OsJTKf1QwsIQaUMPiSF0WgBUCs2IfZNBpOevxB+IcKzDPftiEmHtW6c7yvvertvgCTar5vur1 d2Mhx0mRzrpATABoUz7byz5ekuDjE2ULkxl/3npaeLDswE2K4nDrSqN/I/sncUDG5VuHQ+Vz0ZF 2hZ80M9bF0gZ7E+jmCV9iL4lqBZZnzS1quxPjHn745eOrle3T8vm3yZsmn5P9cEhBCRpVhMK5s1 12YSduE3T56nfBeqmjxXEz3yiR4C5SKEaqAB1/E7IaifPUB/T+dvyG+IeWodMtzH6kEBwUS3X5s rgKXzYRfW8SLe+g6Xetq6+3eZ5KAYhPBIwJMjJbd8ZGGYXHd7RTGjfl20sHcfL54eCH/Ck4PiB+ Mzp/l+X46195EIqFNWg== X-Proofpoint-ORIG-GUID: PvXIK7001yah_q_sVCLHPSoZDp3mLo2e X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1143,Hydra:6.1.51,FMLib:17.12.100.49 definitions=2026-03-12_02,2026-03-12_01,2025-10-01_01 Content-Type: text/plain; charset="utf-8" A DMABUF exports access to BAR resources which need to be requested before the DMABUF is handed out. Usually the resources are requested when setting up the barmap when the VFIO device fd is mmap()ed, but there's no guarantee that's done before a DMABUF is created. Set up the barmap (and so request resources) in the DMABUF-creation path. Fixes: 5d74781ebc86c ("vfio/pci: Add dma-buf export support for MMIO region= s") Signed-off-by: Matt Evans --- drivers/vfio/pci/vfio_pci_dmabuf.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/drivers/vfio/pci/vfio_pci_dmabuf.c b/drivers/vfio/pci/vfio_pci= _dmabuf.c index 3a803923141b..44558cc2948e 100644 --- a/drivers/vfio/pci/vfio_pci_dmabuf.c +++ b/drivers/vfio/pci/vfio_pci_dmabuf.c @@ -269,6 +269,17 @@ int vfio_pci_core_feature_dma_buf(struct vfio_pci_core= _device *vdev, u32 flags, goto err_free_priv; } =20 + /* + * Just like the vfio_pci_core_mmap() path, we need to ensure + * PCI regions have been requested before returning DMABUFs + * that reference them. It's possible to create a DMABUF for + * a BAR without the BAR having already been mmap()ed. The + * barmap setup requests the regions for us: + */ + ret =3D vfio_pci_core_setup_barmap(vdev, get_dma_buf.region_index); + if (ret) + goto err_free_phys; + priv->vdev =3D vdev; priv->nr_ranges =3D get_dma_buf.nr_ranges; priv->size =3D length; --=20 2.47.3 From nobody Tue Apr 7 14:36:49 2026 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 83B323FD12C; Thu, 12 Mar 2026 18:46:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=67.231.145.42 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773341207; cv=none; b=cydkNYQ0OcfIRN36EqS//is7GC8ruwqWepvMAJD+9gPBl5X9dG2RnZkzLYBFcWYP5UTwGjKDkxqSJX1lRkTGyiVIynMrov2qn37TxKnqVlsJxM5yE+BxLzmbYU4rUT3ny6V0hNtIub21lg6xcQUfBEmN62bqfSdzRPghDo31l6Y= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773341207; c=relaxed/simple; bh=SlPadT6Oif2I8Ldrfkcy50pxboNZmXXA8d3IUOT2bv4=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=FTm0ZDwJXFAdXoMmiIB25rBPi2zZOxXrkOyO2npt1/cx+dlBrvOwIbjzHH6rqGthFX/9eP5FM72XgBRkgjTGq3aQDf7wzQ3d+apMfZtYxA4CUaTmqqEIHj+feH/EJEgsP1/VdDoHD3g16qBStYFC8CocBDTboF4FIm7V2vnh2Ps= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com; spf=pass smtp.mailfrom=meta.com; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b=AKkjswGo; arc=none smtp.client-ip=67.231.145.42 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=meta.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b="AKkjswGo" Received: from pps.filterd (m0148461.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.18.1.11/8.18.1.11) with ESMTP id 62CHrWqW2853764; Thu, 12 Mar 2026 11:46:33 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=s2048-2025-q2; bh=yABRgDa4Smt7xH35V/HV2UTHouYqNi0/j2J9K5YgcvI=; b=AKkjswGoDJeJ 3LFVTzysksvweic7lsg+Q281vwoLUjw5DyDTrrb+jaTG9HKMB/vURPWqdTGsheJW mnjkiuo6TgTIQXZ2b5MjJNNOTOlLNa/CAFWuj9yTaa0NOdNSynTpEukQMHT8wJro Oxf++2ulrANLf+S4HXIRVMUrE5ZWDvzTuUWeauYrOvBuN0vRpt5wMN0M7BV9tUmS ZJ4OvOqi4Tc8GWIoQ+aV+z5YaCRvIXL1hESLHawyPtIDSixBk4NjxZjNClKFXYCI UMEYKzr4Ycz1jTDeYVBQsESvL0RNwiD1VinZPJENU6fv3O5pPbqq9gyjkK4UzIux kXZGn9k3nw== Received: from mail.thefacebook.com ([163.114.134.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 4cv29nh38v-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT); Thu, 12 Mar 2026 11:46:32 -0700 (PDT) Received: from localhost (2620:10d:c085:108::4) by mail.thefacebook.com (2620:10d:c08b:78::2ac9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.2562.35; Thu, 12 Mar 2026 18:46:32 +0000 From: Matt Evans To: Alex Williamson , Leon Romanovsky , Jason Gunthorpe , Alex Mastro , Mahmoud Adam , David Matlack CC: =?UTF-8?q?Bj=C3=B6rn=20T=C3=B6pel?= , Sumit Semwal , =?UTF-8?q?Christian=20K=C3=B6nig?= , Kevin Tian , Ankit Agrawal , Pranjal Shrivastava , Alistair Popple , Vivek Kasireddy , , , , , Subject: [RFC v2 PATCH 02/10] vfio/pci: Clean up DMABUFs before disabling function Date: Thu, 12 Mar 2026 11:46:00 -0700 Message-ID: <20260312184613.3710705-3-mattev@meta.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260312184613.3710705-1-mattev@meta.com> References: <20260312184613.3710705-1-mattev@meta.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Proofpoint-GUID: T8Yuc9PQqL6Z-6JxP8LV4CMEZQgXG9hN X-Proofpoint-ORIG-GUID: T8Yuc9PQqL6Z-6JxP8LV4CMEZQgXG9hN X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwMzEyMDE1MSBTYWx0ZWRfX7XzcAXsQ5Hr/ waDgvCGVXfyPZksrBh7BebX5id10i4wDMWPzJC1L0EbTSXpHYLMfpRY4PGPcXSdVwkI+5HxdsjQ xYnUqKCqgBd3UcZEJiwK3SrWO50h1PgXcXaxmfjp8yKwI0H/Z8BAUZC4buJNfAl4UaqJQSwdYrB afVvUp0fU2aUJIXRD/+HsmMAAvqtnfA3kidQflYpbq1IoK+6nGkuyWzcNPM2KN84Fbqv9fJkTjh kc9Mw76bhQAVQQ7qIn3CKclIT+jIc9UATBYZEgTSG8vAojpJO9yBqVsW9uHx4CJqeMfGSA+lig/ Qt+yJtfXJGF1MY7bCrUrTP5eZvpiOwmb7T3lNqeWqfKlGUIk5SblFPojq7wbpCylvoa2VHaiMiV c6Os+n/hClffeC6nXzOMLTI5/zRGJS+4Fy52RYPGqXfUOu2DVoXvOUDns77xFc35Ca3G+JI8QdE zT3ZUoo4hVc2qAy6Ing== X-Authority-Analysis: v=2.4 cv=FvYIPmrq c=1 sm=1 tr=0 ts=69b30a09 cx=c_pps a=CB4LiSf2rd0gKozIdrpkBw==:117 a=CB4LiSf2rd0gKozIdrpkBw==:17 a=Yq5XynenixoA:10 a=VkNPw1HP01LnGYTKEx00:22 a=7x6HtfJdh03M6CCDgxCd:22 a=03ozwUkBphtHgyqjj1sw:22 a=VabnemYjAAAA:8 a=4DpnPWOeRWuCnonzwFEA:9 a=gKebqoRLp9LExxC7YDUY:22 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1143,Hydra:6.1.51,FMLib:17.12.100.49 definitions=2026-03-12_02,2026-03-12_01,2025-10-01_01 Content-Type: text/plain; charset="utf-8" On device shutdown, make vfio_pci_core_close_device() call vfio_pci_dma_buf_cleanup() before the function is disabled via vfio_pci_core_disable(). This ensures that all access via DMABUFs is revoked before the function's BARs become inaccessible. This fixes an issue where, if the function is disabled first, a tiny window exists in which the function's MSE is cleared and yet BARs could still be accessed via the DMABUF. The resources would also be freed and up for grabs by a different driver. Fixes: 5d74781ebc86c ("vfio/pci: Add dma-buf export support for MMIO region= s") Signed-off-by: Matt Evans Reviewed-by: Jason Gunthorpe --- drivers/vfio/pci/vfio_pci_core.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_c= ore.c index d43745fe4c84..f9ed3374d268 100644 --- a/drivers/vfio/pci/vfio_pci_core.c +++ b/drivers/vfio/pci/vfio_pci_core.c @@ -734,10 +734,10 @@ void vfio_pci_core_close_device(struct vfio_device *c= ore_vdev) #if IS_ENABLED(CONFIG_EEH) eeh_dev_release(vdev->pdev); #endif - vfio_pci_core_disable(vdev); - vfio_pci_dma_buf_cleanup(vdev); =20 + vfio_pci_core_disable(vdev); + mutex_lock(&vdev->igate); vfio_pci_eventfd_replace_locked(vdev, &vdev->err_trigger, NULL); vfio_pci_eventfd_replace_locked(vdev, &vdev->req_trigger, NULL); --=20 2.47.3 From nobody Tue Apr 7 14:36:49 2026 Received: from mx0a-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1BA163B38B6; Thu, 12 Mar 2026 18:46:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=67.231.153.30 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773341221; cv=none; b=GEnlg1Buny9kEr49HqJDUV15X9VmEY2bFuLArGXhZh+UAngCvlgK8vTabTBWMALQdJZAjrYVCjjZu4T6G/jkS6iflwP3qtkrHf2VEOFII7+GJ9dWMKX5D3MRvhp9FTlULKmYpU9oFDze2oSfFlUgZ3sjWn9683ds1vduhSajGCM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773341221; c=relaxed/simple; bh=t6bDpi4rb/pOu7H+QT7goQOzdqvFfPwsK5zs2XcTZNU=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=FpdRcE1IbKlPSCijsUlo0jbyj1y/kqA1VkQo1ahRL0wP1QhQqf5nvJSVEHGcUN00BNHg0W4wIKf97ErPPcorzV7zz3QMyGRVC6YnXYb170YwV/NC9SADhexcn7K3w6MiBlrUI87ivjYg6/kGsEbAUxhk27JQvfuQtWlqeU88Izo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com; spf=pass smtp.mailfrom=meta.com; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b=tGmcNPiF; arc=none smtp.client-ip=67.231.153.30 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=meta.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b="tGmcNPiF" Received: from pps.filterd (m0001303.ppops.net [127.0.0.1]) by m0001303.ppops.net (8.18.1.11/8.18.1.11) with ESMTP id 62CHs6hE736960; Thu, 12 Mar 2026 11:46:40 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=s2048-2025-q2; bh=F5iP07MryKL+RBsKPNOrGIpOE0QiFHOMNY2skfwri+I=; b=tGmcNPiFi4vV t6FnUDDv+vLa3kMO3p0i07iWoKFn9rsEAf42YjvI6n69DHDOWEVuEXJq4B+WdLqo 1910n5MQVkqLlCQiMg2nayCgF2jcKPfAd1Znem1lwvKniN7RwYAgBnHVfzuffMKw yKpC5QR7r0TKXqzkUHHIv819vX9zASDRjcJG7iXBK1Y4jFpjs+EXWpP58i+9597P SrOYwvNldvEMQ1xeozyCpy0dcXLyq9O4qWipxsAAFaPZ7Tk2S5vTZywzjKtBTgqK elnQ1PeVbZ1z5IclB3ZtOchhfcQ/jAr96ohdb2/YEny4nj6CeVVQov6mU4uXCp12 zIxRAqY0nQ== Received: from maileast.thefacebook.com ([163.114.135.16]) by m0001303.ppops.net (PPS) with ESMTPS id 4cv29xs1t8-3 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT); Thu, 12 Mar 2026 11:46:40 -0700 (PDT) Received: from localhost (2620:10d:c0a8:1b::30) by mail.thefacebook.com (2620:10d:c0a9:6f::237c) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.2562.35; Thu, 12 Mar 2026 18:46:35 +0000 From: Matt Evans To: Alex Williamson , Leon Romanovsky , Jason Gunthorpe , Alex Mastro , Mahmoud Adam , David Matlack CC: =?UTF-8?q?Bj=C3=B6rn=20T=C3=B6pel?= , Sumit Semwal , =?UTF-8?q?Christian=20K=C3=B6nig?= , Kevin Tian , Ankit Agrawal , Pranjal Shrivastava , Alistair Popple , Vivek Kasireddy , , , , , Subject: [RFC v2 PATCH 03/10] vfio/pci: Add helper to look up PFNs for DMABUFs Date: Thu, 12 Mar 2026 11:46:01 -0700 Message-ID: <20260312184613.3710705-4-mattev@meta.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260312184613.3710705-1-mattev@meta.com> References: <20260312184613.3710705-1-mattev@meta.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Proofpoint-ORIG-GUID: 9y-dGupEJZ0nM22C7yEmPbFzaVwEVxI5 X-Proofpoint-GUID: 9y-dGupEJZ0nM22C7yEmPbFzaVwEVxI5 X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwMzEyMDE1MSBTYWx0ZWRfX7og+xCebr1Z8 9wGO5s5UEdE2Yeslw0vblbNfCiRwbJoSvgo9nlWQT8cVXzF5sfLQZqTfIzPzNQ6p/UQMP5Iv1nh WluUZEZs+iSxIJgAWsDtwjaEa2jHNi3ULa0b/Kt24ZApxDXGi2h3EmLrb/R14FPgHtbtkbecHE/ w3hY/3gQb7N/8ylpDY7eoxpXUaJfVtPYmdadjVtqE1WzzVbXVhTrrJDNpc0+xiv4uwX0AVP3sdY /tVZ72jJjyRxDNKRILGF123tgkRQgyzovZ7KkOwnzSO62y6/5jiAUTXsZ9ZNQel4PluK714LwT8 QrB0gjPihfTYndRSJKNNZw9naQDsm37u8kj2g2iGR5tfVI9H7BCcFxzLWFxrL2w6nVKbekF1NQF HFRyr2izqkyY0xYmrjY3+s3q7warmVesoHMcYw2lR4IMMurs7dEd6ShEEskYQchI8gZDLdd591W 8a842RHSrLPEoAGZ4VQ== X-Authority-Analysis: v=2.4 cv=G4YR0tk5 c=1 sm=1 tr=0 ts=69b30a10 cx=c_pps a=MfjaFnPeirRr97d5FC5oHw==:117 a=MfjaFnPeirRr97d5FC5oHw==:17 a=Yq5XynenixoA:10 a=VkNPw1HP01LnGYTKEx00:22 a=7x6HtfJdh03M6CCDgxCd:22 a=_78whYxrdx1mplLwxq1U:22 a=VabnemYjAAAA:8 a=FgeSMtyZeV4UKi14g-cA:9 a=gKebqoRLp9LExxC7YDUY:22 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1143,Hydra:6.1.51,FMLib:17.12.100.49 definitions=2026-03-12_02,2026-03-12_01,2025-10-01_01 Content-Type: text/plain; charset="utf-8" Add a helper, vfio_pci_dma_buf_find_pfn(), which a VMA fault handler can use to find a PFN. This supports multi-range DMABUFs, which typically would be used to represent scattered spans but might even represent overlapping or aliasing spans of PFNs. Because this is intended to be used in vfio_pci_core.c, we also need to expose the struct vfio_pci_dma_buf in the vfio_pci_priv.h header. Signed-off-by: Matt Evans --- drivers/vfio/pci/vfio_pci_dmabuf.c | 102 +++++++++++++++++++++++++---- drivers/vfio/pci/vfio_pci_priv.h | 19 ++++++ 2 files changed, 108 insertions(+), 13 deletions(-) diff --git a/drivers/vfio/pci/vfio_pci_dmabuf.c b/drivers/vfio/pci/vfio_pci= _dmabuf.c index 44558cc2948e..63140528dbea 100644 --- a/drivers/vfio/pci/vfio_pci_dmabuf.c +++ b/drivers/vfio/pci/vfio_pci_dmabuf.c @@ -9,19 +9,6 @@ =20 MODULE_IMPORT_NS("DMA_BUF"); =20 -struct vfio_pci_dma_buf { - struct dma_buf *dmabuf; - struct vfio_pci_core_device *vdev; - struct list_head dmabufs_elm; - size_t size; - struct phys_vec *phys_vec; - struct p2pdma_provider *provider; - u32 nr_ranges; - struct kref kref; - struct completion comp; - u8 revoked : 1; -}; - static int vfio_pci_dma_buf_attach(struct dma_buf *dmabuf, struct dma_buf_attachment *attachment) { @@ -106,6 +93,95 @@ static const struct dma_buf_ops vfio_pci_dmabuf_ops =3D= { .release =3D vfio_pci_dma_buf_release, }; =20 +int vfio_pci_dma_buf_find_pfn(struct vfio_pci_dma_buf *vpdmabuf, + struct vm_area_struct *vma, + unsigned long address, + unsigned int order, + unsigned long *out_pfn) +{ + /* + * Given a VMA (start, end, pgoffs) and a fault address, + * search the corresponding DMABUF's phys_vec[] to find the + * range representing the address's offset into the VMA, and + * its PFN. + * + * The phys_vec[] ranges represent contiguous spans of VAs + * upwards from the buffer offset 0; the actual PFNs might be + * in any order, overlap/alias, etc. Calculate an offset of + * the desired page given VMA start/pgoff and address, then + * search upwards from 0 to find which span contains it. + * + * On success, a valid PFN for a page sized by 'order' is + * returned into out_pfn. + * + * Failure occurs if: + * - The page would cross the edge of the VMA + * - The page isn't entirely contained within a range + * - We find a range, but the final PFN isn't aligned to the + * requested order. + * + * (Upon failure, the caller is expected to try again with a + * smaller order; the tests above will always succeed for + * order=3D0 as the limit case.) + * + * It's suboptimal if DMABUFs are created with neigbouring + * ranges that are physically contiguous, since hugepages + * can't straddle range boundaries. (The construction of the + * ranges vector should merge such ranges.) + */ + + const unsigned long pagesize =3D PAGE_SIZE << order; + unsigned long rounded_page_addr =3D address & ~(pagesize - 1); + unsigned long rounded_page_end =3D rounded_page_addr + pagesize; + unsigned long buf_page_offset; + unsigned long buf_offset =3D 0; + unsigned int i; + + if (rounded_page_addr < vma->vm_start || rounded_page_end > vma->vm_end) + return -EAGAIN; + + if (unlikely(check_add_overflow(rounded_page_addr - vma->vm_start, + vma->vm_pgoff << PAGE_SHIFT, &buf_page_offset))) + return -EFAULT; + + for (i =3D 0; i < vpdmabuf->nr_ranges; i++) { + unsigned long range_len =3D vpdmabuf->phys_vec[i].len; + unsigned long range_start =3D vpdmabuf->phys_vec[i].paddr; + + if (buf_page_offset >=3D buf_offset && + buf_page_offset + pagesize <=3D buf_offset + range_len) { + /* + * The faulting page is wholly contained + * within the span represented by the range. + * Validate PFN alignment for the order: + */ + unsigned long pfn =3D (range_start >> PAGE_SHIFT) + + ((buf_page_offset - buf_offset) >> PAGE_SHIFT); + + if (IS_ALIGNED(pfn, 1 << order)) { + *out_pfn =3D pfn; + return 0; + } + /* Retry with smaller order */ + return -EAGAIN; + } + buf_offset +=3D range_len; + } + + /* + * If we get here, the address fell outside of the span + * represented by the (concatenated) ranges. Setup of a + * mapping must ensure that the VMA is <=3D the total size of + * the ranges, so this should never happen. But, if it does, + * force SIGBUS for the access and warn. + */ + WARN_ONCE(1, "No range for addr 0x%lx, order %d: VMA 0x%lx-0x%lx pgoff 0x= %lx, %d ranges, size 0x%lx\n", + address, order, vma->vm_start, vma->vm_end, vma->vm_pgoff, + vpdmabuf->nr_ranges, vpdmabuf->size); + + return -EFAULT; +} + /* * This is a temporary "private interconnect" between VFIO DMABUF and iomm= ufd. * It allows the two co-operating drivers to exchange the physical address= of diff --git a/drivers/vfio/pci/vfio_pci_priv.h b/drivers/vfio/pci/vfio_pci_p= riv.h index 27ac280f00b9..5cc8c85a2153 100644 --- a/drivers/vfio/pci/vfio_pci_priv.h +++ b/drivers/vfio/pci/vfio_pci_priv.h @@ -23,6 +23,19 @@ struct vfio_pci_ioeventfd { bool test_mem; }; =20 +struct vfio_pci_dma_buf { + struct dma_buf *dmabuf; + struct vfio_pci_core_device *vdev; + struct list_head dmabufs_elm; + size_t size; + struct phys_vec *phys_vec; + struct p2pdma_provider *provider; + u32 nr_ranges; + struct kref kref; + struct completion comp; + u8 revoked : 1; +}; + bool vfio_pci_intx_mask(struct vfio_pci_core_device *vdev); void vfio_pci_intx_unmask(struct vfio_pci_core_device *vdev); =20 @@ -110,6 +123,12 @@ static inline bool vfio_pci_is_vga(struct pci_dev *pde= v) return (pdev->class >> 8) =3D=3D PCI_CLASS_DISPLAY_VGA; } =20 +int vfio_pci_dma_buf_find_pfn(struct vfio_pci_dma_buf *vpdmabuf, + struct vm_area_struct *vma, + unsigned long address, + unsigned int order, + unsigned long *out_pfn); + #ifdef CONFIG_VFIO_PCI_DMABUF int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 f= lags, struct vfio_device_feature_dma_buf __user *arg, --=20 2.47.3 From nobody Tue Apr 7 14:36:49 2026 Received: from mx0a-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C01CE3FFAB6; Thu, 12 Mar 2026 18:47:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=67.231.153.30 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773341223; cv=none; b=GQJlZZ1+8q6nyAJbWJVmgBQIegTLn4Itw1o69tjNQMG9hE6hSxhOy+6fAu5MoIpYhgEUd37GM5Av5ZJFIqDRnU5v1v7cPz+cTfRFb6dl7jQt9RL278IiGeYz7JZuNg+UV0XvOF7zN82Ws1ePWqZ9R1i1D0a5MHChvawBDP/CNMs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773341223; c=relaxed/simple; bh=rPZHviA827I0RrNblyUlBtmzE+gRDe3I7XUG7V09gK8=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=hyIArvQZ2/Id7hleTY9Cmf1V9tHxO68uFEQfAt2bWjnh4Bzh3qDM14zxEY43ddFcF7y5PK810GHnZ+tB0qTCqPzkLfWS31C+sOKSIXjZ6rD/swZ0LFM8gouZwcKa6COuUpwUGIztREKJMEViWJJo/0YV6FuRgjoDfI8vG3Qntws= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com; spf=pass smtp.mailfrom=meta.com; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b=sQe13Fme; arc=none smtp.client-ip=67.231.153.30 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=meta.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b="sQe13Fme" Received: from pps.filterd (m0001303.ppops.net [127.0.0.1]) by m0001303.ppops.net (8.18.1.11/8.18.1.11) with ESMTP id 62CHs6hG736960; Thu, 12 Mar 2026 11:46:40 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=s2048-2025-q2; bh=ZrlwwZVUdcMNUJAX32tIRUCuImsZe5FshFy6zCbtrbE=; b=sQe13FmeJUaM Wt7xRE0a3ver+QCEpyBcUDV2Xsw3KljnCngbtDXhbLKBDmt1tZAHdRf+ae65fSGO EngT5lcQF1HVd5y11X4moY4ftfrdtPn8Et0ngQwNiZPVUJnaotF646yfYW3iYcly gkY0TmYia8qH8Cc43dmFWn2Sc01EF1IKbf6wotXlMbJXLnq6CJIbir/KJU6g/adI yWPJreIK/FML5rXRXTD1ej8Gv4Sc06XByslUjsTRMj9+MfYQ9rp4wTA+PrErZwIt H7uwzx1JPZj1oou2HG7Jjrrsx7w5ZxwmoaGzf7lF2YROFlpUvpzlwv7T3mKhGPLs hEUamEY+fg== Received: from maileast.thefacebook.com ([163.114.135.16]) by m0001303.ppops.net (PPS) with ESMTPS id 4cv29xs1t8-4 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT); Thu, 12 Mar 2026 11:46:40 -0700 (PDT) Received: from localhost (2620:10d:c0a8:1c::1b) by mail.thefacebook.com (2620:10d:c0a9:6f::237c) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.2562.35; Thu, 12 Mar 2026 18:46:37 +0000 From: Matt Evans To: Alex Williamson , Leon Romanovsky , Jason Gunthorpe , Alex Mastro , Mahmoud Adam , David Matlack CC: =?UTF-8?q?Bj=C3=B6rn=20T=C3=B6pel?= , Sumit Semwal , =?UTF-8?q?Christian=20K=C3=B6nig?= , Kevin Tian , Ankit Agrawal , Pranjal Shrivastava , Alistair Popple , Vivek Kasireddy , , , , , Subject: [RFC v2 PATCH 04/10] vfio/pci: Add a helper to create a DMABUF for a BAR-map VMA Date: Thu, 12 Mar 2026 11:46:02 -0700 Message-ID: <20260312184613.3710705-5-mattev@meta.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260312184613.3710705-1-mattev@meta.com> References: <20260312184613.3710705-1-mattev@meta.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Proofpoint-ORIG-GUID: yWV221Ynv3RiRAhf8YGWx40uCR8JGEaY X-Proofpoint-GUID: yWV221Ynv3RiRAhf8YGWx40uCR8JGEaY X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwMzEyMDE1MSBTYWx0ZWRfX/K+cZwc/d5h4 V6UbyYW1PpDsWHpDykBPG55BoC9bmrkIgEjod/kyYfCZwsghhpb7cNg2vHWh/ZDoykXawDCY3gg WNFQ8IabTZPwZMLaygob6CVGYjkVCBr0934NDsM4Vi9gT9Ue/210UIXPAUsOqCrJ3g1GHe9924S IYkrfslpsKmUb/X6On+VEYfGr8oM5p66v2bRmxtpxSEWtdAipF+2wjJHs+AyTRzKSF7/zt4nFCs XXqy4txRhV5cjLIARoSin8zcXja3aIPJtCxf0HecQy4b8yxTn4fV/ylb19KDsawTfSEpHdRaLr0 ebSliXkTZozR1TSrESNexyVL0pfMF1q0H8fBa9AgyHq5bJ9SiG1S9KKQT2rg2Zjb6LtFktYguTm TFDK5eYH4/H5G2d6iXfBouFh7Ly75KmiUI6A+v5o1I3Wk2MPwfFRIhX+sG5W1Qu7HqZ7o60ae14 PkOKk+c2+mLA1IpwvZg== X-Authority-Analysis: v=2.4 cv=G4YR0tk5 c=1 sm=1 tr=0 ts=69b30a10 cx=c_pps a=MfjaFnPeirRr97d5FC5oHw==:117 a=MfjaFnPeirRr97d5FC5oHw==:17 a=Dv35txUGz5gI0hTa:21 a=Yq5XynenixoA:10 a=VkNPw1HP01LnGYTKEx00:22 a=7x6HtfJdh03M6CCDgxCd:22 a=_78whYxrdx1mplLwxq1U:22 a=VabnemYjAAAA:8 a=tb9bu44HMbSD3pB8wWYA:9 a=gKebqoRLp9LExxC7YDUY:22 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1143,Hydra:6.1.51,FMLib:17.12.100.49 definitions=2026-03-12_02,2026-03-12_01,2025-10-01_01 Content-Type: text/plain; charset="utf-8" This helper, vfio_pci_core_mmap_prep_dmabuf(), creates a single-range DMABUF for the purpose of mapping a PCI BAR. This is used in a future commit by VFIO's ordinary mmap() path. This function transfers ownership of the VFIO device fd to the DMABUF, which fput()s when it's released. Refactor the existing vfio_pci_core_feature_dma_buf() to split out export code common to the two paths, VFIO_DEVICE_FEATURE_DMA_BUF and this new VFIO_BAR mmap(). Signed-off-by: Matt Evans --- drivers/vfio/pci/vfio_pci_dmabuf.c | 131 +++++++++++++++++++++-------- drivers/vfio/pci/vfio_pci_priv.h | 4 + 2 files changed, 102 insertions(+), 33 deletions(-) diff --git a/drivers/vfio/pci/vfio_pci_dmabuf.c b/drivers/vfio/pci/vfio_pci= _dmabuf.c index 63140528dbea..76db340ba592 100644 --- a/drivers/vfio/pci/vfio_pci_dmabuf.c +++ b/drivers/vfio/pci/vfio_pci_dmabuf.c @@ -82,6 +82,8 @@ static void vfio_pci_dma_buf_release(struct dma_buf *dmab= uf) up_write(&priv->vdev->memory_lock); vfio_device_put_registration(&priv->vdev->vdev); } + if (priv->vfile) + fput(priv->vfile); kfree(priv->phys_vec); kfree(priv); } @@ -182,6 +184,41 @@ int vfio_pci_dma_buf_find_pfn(struct vfio_pci_dma_buf = *vpdmabuf, return -EFAULT; } =20 +static int vfio_pci_dmabuf_export(struct vfio_pci_core_device *vdev, + struct vfio_pci_dma_buf *priv, uint32_t flags, + size_t size, bool status_ok) +{ + DEFINE_DMA_BUF_EXPORT_INFO(exp_info); + + if (!vfio_device_try_get_registration(&vdev->vdev)) + return -ENODEV; + + exp_info.ops =3D &vfio_pci_dmabuf_ops; + exp_info.size =3D size; + exp_info.flags =3D flags; + exp_info.priv =3D priv; + + priv->dmabuf =3D dma_buf_export(&exp_info); + if (IS_ERR(priv->dmabuf)) { + vfio_device_put_registration(&vdev->vdev); + return PTR_ERR(priv->dmabuf); + } + + kref_init(&priv->kref); + init_completion(&priv->comp); + + /* dma_buf_put() now frees priv */ + INIT_LIST_HEAD(&priv->dmabufs_elm); + down_write(&vdev->memory_lock); + dma_resv_lock(priv->dmabuf->resv, NULL); + priv->revoked =3D !status_ok; + list_add_tail(&priv->dmabufs_elm, &vdev->dmabufs); + dma_resv_unlock(priv->dmabuf->resv); + up_write(&vdev->memory_lock); + + return 0; +} + /* * This is a temporary "private interconnect" between VFIO DMABUF and iomm= ufd. * It allows the two co-operating drivers to exchange the physical address= of @@ -300,7 +337,6 @@ int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_= device *vdev, u32 flags, { struct vfio_device_feature_dma_buf get_dma_buf =3D {}; struct vfio_region_dma_range *dma_ranges; - DEFINE_DMA_BUF_EXPORT_INFO(exp_info); struct vfio_pci_dma_buf *priv; size_t length; int ret; @@ -369,46 +405,20 @@ int vfio_pci_core_feature_dma_buf(struct vfio_pci_cor= e_device *vdev, u32 flags, kfree(dma_ranges); dma_ranges =3D NULL; =20 - if (!vfio_device_try_get_registration(&vdev->vdev)) { - ret =3D -ENODEV; + ret =3D vfio_pci_dmabuf_export(vdev, priv, get_dma_buf.open_flags, + priv->size, + __vfio_pci_memory_enabled(vdev)); + if (ret) goto err_free_phys; - } - - exp_info.ops =3D &vfio_pci_dmabuf_ops; - exp_info.size =3D priv->size; - exp_info.flags =3D get_dma_buf.open_flags; - exp_info.priv =3D priv; - - priv->dmabuf =3D dma_buf_export(&exp_info); - if (IS_ERR(priv->dmabuf)) { - ret =3D PTR_ERR(priv->dmabuf); - goto err_dev_put; - } - - kref_init(&priv->kref); - init_completion(&priv->comp); - - /* dma_buf_put() now frees priv */ - INIT_LIST_HEAD(&priv->dmabufs_elm); - down_write(&vdev->memory_lock); - dma_resv_lock(priv->dmabuf->resv, NULL); - priv->revoked =3D !__vfio_pci_memory_enabled(vdev); - list_add_tail(&priv->dmabufs_elm, &vdev->dmabufs); - dma_resv_unlock(priv->dmabuf->resv); - up_write(&vdev->memory_lock); - /* * dma_buf_fd() consumes the reference, when the file closes the dmabuf * will be released. */ ret =3D dma_buf_fd(priv->dmabuf, get_dma_buf.open_flags); - if (ret < 0) - goto err_dma_buf; - return ret; + if (ret >=3D 0) + return ret; =20 -err_dma_buf: dma_buf_put(priv->dmabuf); -err_dev_put: vfio_device_put_registration(&vdev->vdev); err_free_phys: kfree(priv->phys_vec); @@ -419,6 +429,61 @@ int vfio_pci_core_feature_dma_buf(struct vfio_pci_core= _device *vdev, u32 flags, return ret; } =20 +int vfio_pci_core_mmap_prep_dmabuf(struct vfio_pci_core_device *vdev, + struct vm_area_struct *vma, + u64 phys_start, + u64 pgoff, + u64 req_len) +{ + struct vfio_pci_dma_buf *priv; + const unsigned int nr_ranges =3D 1; + int ret; + + priv =3D kzalloc(sizeof(*priv), GFP_KERNEL); + if (!priv) + return -ENOMEM; + + priv->phys_vec =3D kcalloc(nr_ranges, sizeof(*priv->phys_vec), + GFP_KERNEL); + if (!priv->phys_vec) { + ret =3D -ENOMEM; + goto err_free_priv; + } + + priv->vdev =3D vdev; + priv->nr_ranges =3D nr_ranges; + priv->size =3D req_len; + priv->phys_vec[0].paddr =3D phys_start + (pgoff << PAGE_SHIFT); + priv->phys_vec[0].len =3D req_len; + + /* + * Creates a DMABUF, adds it to vdev->dmabufs list for + * tracking (meaning cleanup or revocation will zap them), and + * registers with vfio_device: + */ + ret =3D vfio_pci_dmabuf_export(vdev, priv, O_CLOEXEC, priv->size, true); + if (ret) + goto err_free_phys; + + /* + * The VMA gets the DMABUF file so that other users can locate + * the DMABUF via a VA. Ownership of the original VFIO device + * file being mmap()ed transfers to priv, and is put when the + * DMABUF is released. + */ + priv->vfile =3D vma->vm_file; + vma->vm_file =3D priv->dmabuf->file; + vma->vm_private_data =3D priv; + + return 0; + +err_free_phys: + kfree(priv->phys_vec); +err_free_priv: + kfree(priv); + return ret; +} + void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev, bool revoked) { struct vfio_pci_dma_buf *priv; diff --git a/drivers/vfio/pci/vfio_pci_priv.h b/drivers/vfio/pci/vfio_pci_p= riv.h index 5cc8c85a2153..5fd3a6e00a0e 100644 --- a/drivers/vfio/pci/vfio_pci_priv.h +++ b/drivers/vfio/pci/vfio_pci_priv.h @@ -30,6 +30,7 @@ struct vfio_pci_dma_buf { size_t size; struct phys_vec *phys_vec; struct p2pdma_provider *provider; + struct file *vfile; u32 nr_ranges; struct kref kref; struct completion comp; @@ -128,6 +129,9 @@ int vfio_pci_dma_buf_find_pfn(struct vfio_pci_dma_buf *= vpdmabuf, unsigned long address, unsigned int order, unsigned long *out_pfn); +int vfio_pci_core_mmap_prep_dmabuf(struct vfio_pci_core_device *vdev, + struct vm_area_struct *vma, + u64 phys_start, u64 pgoff, u64 req_len); =20 #ifdef CONFIG_VFIO_PCI_DMABUF int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 f= lags, --=20 2.47.3 From nobody Tue Apr 7 14:36:49 2026 Received: from mx0a-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CF4512D6E6C; Thu, 12 Mar 2026 18:46:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=67.231.153.30 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773341217; cv=none; b=sRn5Ggbk5ZoBbmgZsw3m2wTksk1ua8AP6Pj0Mtsi0fYxT07SXO5SLziRYXBwK+jVFH5hYTfhgmvnc6BhQdonmiuJS4AEKX5Ci69+DPL/urh+4LgY7O3wV1PZuCM2blV2SGINwAffLczxarFvsNKUH8oixTNakcGqFeNXj5C0Tdk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773341217; c=relaxed/simple; bh=rdm1w5qspSSlVXdGgJm7AT3pPM1P3uGk48Z/3E6fMmo=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=d/pzt0yiWxuuxCwXAEN+BklyqInLOj8j/YvQ7qow7vG55t82uINZ5xnxJESECH2a5AByQqxI+4606v1uqE6klwbdxdC49l4+8aNB8UljamhZleeJU77uyc08Le/+enfEIEZOZM8YH3ocvlg6CAwjj5XJorV2jAjdUvFkN8gvDZk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com; spf=pass smtp.mailfrom=meta.com; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b=vZQUqDmr; arc=none smtp.client-ip=67.231.153.30 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=meta.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b="vZQUqDmr" Received: from pps.filterd (m0089730.ppops.net [127.0.0.1]) by m0089730.ppops.net (8.18.1.11/8.18.1.11) with ESMTP id 62CHL0IC3607800; Thu, 12 Mar 2026 11:46:41 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=s2048-2025-q2; bh=jMiaUo8paSIO6PvAm6fwqjCbL0AMR9nz9Pv6QzgxClo=; b=vZQUqDmrPRzH aby+BVb+xSnrlRlkIPf1HPM5AjklHUdl/1MyVvaCNwZhzzzieBnf5sImbjpSLOMK hVPOCjrcKIFDKqNskM72BJD+6BYLU/xVRl7XO5NohqbIACQn4DkZvcTdlrvf+yJa eu3A5OXJKhKrnz4k/YGx0/bc0DxwErZH7Mec6a4U0mewhODAduqSsh2q846AoIeA aSO315SG60eWA2SlyK0Fy2qhSzhMOzzQtzJ5pMdr8pU2EGg4cg8KXwtk9bQFhA/P hVwabFDY+IoiXw69FDUrYegRG88qA/blMh3HMWNfMv8aR5etBynKs6p20Uxe0tvR HMY4UpIQWg== Received: from mail.thefacebook.com ([163.114.134.16]) by m0089730.ppops.net (PPS) with ESMTPS id 4cv1te9t9d-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT); Thu, 12 Mar 2026 11:46:41 -0700 (PDT) Received: from localhost (2620:10d:c085:108::4) by mail.thefacebook.com (2620:10d:c08b:78::c78f) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.2562.35; Thu, 12 Mar 2026 18:46:39 +0000 From: Matt Evans To: Alex Williamson , Leon Romanovsky , Jason Gunthorpe , Alex Mastro , Mahmoud Adam , David Matlack CC: =?UTF-8?q?Bj=C3=B6rn=20T=C3=B6pel?= , Sumit Semwal , =?UTF-8?q?Christian=20K=C3=B6nig?= , Kevin Tian , Ankit Agrawal , Pranjal Shrivastava , Alistair Popple , Vivek Kasireddy , , , , , Subject: [RFC v2 PATCH 05/10] vfio/pci: Convert BAR mmap() to use a DMABUF Date: Thu, 12 Mar 2026 11:46:03 -0700 Message-ID: <20260312184613.3710705-6-mattev@meta.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260312184613.3710705-1-mattev@meta.com> References: <20260312184613.3710705-1-mattev@meta.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwMzEyMDE1MSBTYWx0ZWRfXyhmrcqMaf9cL A3pNUN53l2yWME+vnW47eHfegXVnnGkUNJ12uX+gGA3qWRiAzNMOhgPCNhheaMbiSyZb3eG3abu DaIi9rYLAfD51G4mcx6iOKMhi1opTwFUSE+aMP1Li6FOlOt7AwDWCNbVv+SDoVPCrpbEAWa9tHm +65vpSUPPJmp7W+cB+q1+uRvGLaQXkaLgz/tfSvcDJQ6N4shRNITg3nuZ0UYoTNigNrZ/or+UO0 Eu5pem7UtNy8VjEOFCVk8Clx+u9mDumD5/Ukto3sV7ZipjwKnEHCHzJn1lKEFuKufEaMmYpC6Qt Nmy1YnUx55KtI5L2iKOx00sP1uI4aWKqeEuvA3JngTZrYaRT2sqkLIeCep16JR9mwAtcr6UAO7v DNLob3grQ6rEvv2pOoVphcQnE6et93gD6IF4Ec9vbgI55t9X3R9ejR8SAJX0Pf1PGq9wQAiSaZU PivFcd8n3ak7RJa6vpA== X-Proofpoint-ORIG-GUID: ieEjwAOBeG8b1DGaKPt6nf1RNWsafjwu X-Authority-Analysis: v=2.4 cv=QoJTHFyd c=1 sm=1 tr=0 ts=69b30a11 cx=c_pps a=CB4LiSf2rd0gKozIdrpkBw==:117 a=CB4LiSf2rd0gKozIdrpkBw==:17 a=Yq5XynenixoA:10 a=VkNPw1HP01LnGYTKEx00:22 a=7x6HtfJdh03M6CCDgxCd:22 a=855S8uPTkML1Oy45N9_h:22 a=VabnemYjAAAA:8 a=KljUu2FEy107lt1Px6sA:9 a=gKebqoRLp9LExxC7YDUY:22 X-Proofpoint-GUID: ieEjwAOBeG8b1DGaKPt6nf1RNWsafjwu X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1143,Hydra:6.1.51,FMLib:17.12.100.49 definitions=2026-03-12_02,2026-03-12_01,2025-10-01_01 Content-Type: text/plain; charset="utf-8" Convert the VFIO device fd fops->mmap to create a DMABUF representing the BAR mapping, and make the VMA fault handler look up PFNs from the corresponding DMABUF. This supports future code mmap()ing BAR DMABUFs, and iommufd work to support Type1 P2P. First, vfio_pci_core_mmap() uses the new vfio_pci_core_mmap_prep_dmabuf() helper to export a DMABUF representing a single BAR range. Then, the vfio_pci_mmap_huge_fault() callback is updated to understand revoked buffers, and uses the new vfio_pci_dma_buf_find_pfn() helper to determine the PFN for a given fault address. Now that the VFIO DMABUFs can be mmap()ed, vfio_pci_dma_buf_move() and vfio_pci_dma_buf_cleanup() need to zap PTEs on revocation and cleanup paths. CONFIG_VFIO_PCI_CORE now unconditionally depends on CONFIG_DMA_SHARED_BUFFER. CONFIG_VFIO_PCI_DMABUF remains, to conditionally include support for VFIO_DEVICE_FEATURE_DMA_BUF, and depends on CONFIG_PCI_P2PDMA. Signed-off-by: Matt Evans --- drivers/vfio/pci/Kconfig | 3 +- drivers/vfio/pci/Makefile | 3 +- drivers/vfio/pci/vfio_pci_core.c | 73 ++++++++++++++++++++---------- drivers/vfio/pci/vfio_pci_dmabuf.c | 14 ++++++ drivers/vfio/pci/vfio_pci_priv.h | 11 +---- 5 files changed, 67 insertions(+), 37 deletions(-) diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig index 1e82b44bda1a..bf5c64d1fe22 100644 --- a/drivers/vfio/pci/Kconfig +++ b/drivers/vfio/pci/Kconfig @@ -6,6 +6,7 @@ config VFIO_PCI_CORE tristate select VFIO_VIRQFD select IRQ_BYPASS_MANAGER + select DMA_SHARED_BUFFER =20 config VFIO_PCI_INTX def_bool y if !S390 @@ -56,7 +57,7 @@ config VFIO_PCI_ZDEV_KVM To enable s390x KVM vfio-pci extensions, say Y. =20 config VFIO_PCI_DMABUF - def_bool y if VFIO_PCI_CORE && PCI_P2PDMA && DMA_SHARED_BUFFER + def_bool y if PCI_P2PDMA =20 source "drivers/vfio/pci/mlx5/Kconfig" =20 diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile index e0a0757dd1d2..bab7a33a2b31 100644 --- a/drivers/vfio/pci/Makefile +++ b/drivers/vfio/pci/Makefile @@ -1,8 +1,7 @@ # SPDX-License-Identifier: GPL-2.0-only =20 -vfio-pci-core-y :=3D vfio_pci_core.o vfio_pci_intrs.o vfio_pci_rdwr.o vfio= _pci_config.o +vfio-pci-core-y :=3D vfio_pci_core.o vfio_pci_intrs.o vfio_pci_rdwr.o vfio= _pci_config.o vfio_pci_dmabuf.o vfio-pci-core-$(CONFIG_VFIO_PCI_ZDEV_KVM) +=3D vfio_pci_zdev.o -vfio-pci-core-$(CONFIG_VFIO_PCI_DMABUF) +=3D vfio_pci_dmabuf.o obj-$(CONFIG_VFIO_PCI_CORE) +=3D vfio-pci-core.o =20 vfio-pci-y :=3D vfio_pci.o diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_c= ore.c index f9ed3374d268..41224efa58d8 100644 --- a/drivers/vfio/pci/vfio_pci_core.c +++ b/drivers/vfio/pci/vfio_pci_core.c @@ -1648,18 +1648,6 @@ void vfio_pci_memory_unlock_and_restore(struct vfio_= pci_core_device *vdev, u16 c up_write(&vdev->memory_lock); } =20 -static unsigned long vma_to_pfn(struct vm_area_struct *vma) -{ - struct vfio_pci_core_device *vdev =3D vma->vm_private_data; - int index =3D vma->vm_pgoff >> (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT); - u64 pgoff; - - pgoff =3D vma->vm_pgoff & - ((1U << (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT)) - 1); - - return (pci_resource_start(vdev->pdev, index) >> PAGE_SHIFT) + pgoff; -} - vm_fault_t vfio_pci_vmf_insert_pfn(struct vfio_pci_core_device *vdev, struct vm_fault *vmf, unsigned long pfn, @@ -1692,23 +1680,45 @@ static vm_fault_t vfio_pci_mmap_huge_fault(struct v= m_fault *vmf, unsigned int order) { struct vm_area_struct *vma =3D vmf->vma; - struct vfio_pci_core_device *vdev =3D vma->vm_private_data; - unsigned long addr =3D vmf->address & ~((PAGE_SIZE << order) - 1); - unsigned long pgoff =3D (addr - vma->vm_start) >> PAGE_SHIFT; - unsigned long pfn =3D vma_to_pfn(vma) + pgoff; + struct vfio_pci_dma_buf *priv =3D vma->vm_private_data; + struct vfio_pci_core_device *vdev; + unsigned long pfn; vm_fault_t ret =3D VM_FAULT_FALLBACK; + int pres; + + vdev =3D READ_ONCE(priv->vdev); =20 - if (is_aligned_for_order(vma, addr, pfn, order)) { - scoped_guard(rwsem_read, &vdev->memory_lock) - ret =3D vfio_pci_vmf_insert_pfn(vdev, vmf, pfn, order); + /* + * A fault might occur after vfio_pci_dma_buf_cleanup() has + * revoked and destroyed the vdev's DMABUFs, and annulled + * vdev. After creation, vdev is only ever written in + * cleanup. + */ + if (!vdev) + return VM_FAULT_SIGBUS; + + pres =3D vfio_pci_dma_buf_find_pfn(priv, vma, vmf->address, order, &pfn); + + if (pres =3D=3D 0) { + scoped_guard(rwsem_read, &vdev->memory_lock) { + /* + * A buffer's revocation/unmap and status + * change occurs whilst holding memory_lock, + * so protects against racing faults. + */ + if (priv->revoked) + ret =3D VM_FAULT_SIGBUS; + else + ret =3D vfio_pci_vmf_insert_pfn(vdev, vmf, pfn, order); + } + } else if (pres !=3D -EAGAIN) { + ret =3D VM_FAULT_SIGBUS; } =20 dev_dbg_ratelimited(&vdev->pdev->dev, - "%s(,order =3D %d) BAR %ld page offset 0x%lx: 0x%x\n", - __func__, order, - vma->vm_pgoff >> - (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT), - pgoff, (unsigned int)ret); + "%s(order =3D %d) PFN 0x%lx, VA 0x%lx, pgoff 0x%lx: 0x%x\n", + __func__, order, pfn, vmf->address, vma->vm_pgoff, + (unsigned int)ret); =20 return ret; } @@ -1773,7 +1783,20 @@ int vfio_pci_core_mmap(struct vfio_device *core_vdev= , struct vm_area_struct *vma if (ret) return ret; =20 - vma->vm_private_data =3D vdev; + /* + * Create a DMABUF with a single range corresponding to this + * mapping, and wire it into vma->vm_private_data. The VMA's + * vm_file becomes that of the DMABUF, and the DMABUF takes + * ownership of the VFIO device file (put upon DMABUF + * release). This maintains the behaviour of a live VMA + * mapping holding the VFIO device file open. + */ + ret =3D vfio_pci_core_mmap_prep_dmabuf(vdev, vma, + pci_resource_start(pdev, index), + pgoff, req_len); + if (ret) + return ret; + vma->vm_page_prot =3D pgprot_noncached(vma->vm_page_prot); vma->vm_page_prot =3D pgprot_decrypted(vma->vm_page_prot); =20 diff --git a/drivers/vfio/pci/vfio_pci_dmabuf.c b/drivers/vfio/pci/vfio_pci= _dmabuf.c index 76db340ba592..197f50365ee1 100644 --- a/drivers/vfio/pci/vfio_pci_dmabuf.c +++ b/drivers/vfio/pci/vfio_pci_dmabuf.c @@ -9,6 +9,7 @@ =20 MODULE_IMPORT_NS("DMA_BUF"); =20 +#ifdef CONFIG_VFIO_PCI_DMABUF static int vfio_pci_dma_buf_attach(struct dma_buf *dmabuf, struct dma_buf_attachment *attachment) { @@ -25,6 +26,7 @@ static int vfio_pci_dma_buf_attach(struct dma_buf *dmabuf, =20 return 0; } +#endif /* CONFIG_VFIO_PCI_DMABUF */ =20 static void vfio_pci_dma_buf_done(struct kref *kref) { @@ -89,7 +91,9 @@ static void vfio_pci_dma_buf_release(struct dma_buf *dmab= uf) } =20 static const struct dma_buf_ops vfio_pci_dmabuf_ops =3D { +#ifdef CONFIG_VFIO_PCI_DMABUF .attach =3D vfio_pci_dma_buf_attach, +#endif .map_dma_buf =3D vfio_pci_dma_buf_map, .unmap_dma_buf =3D vfio_pci_dma_buf_unmap, .release =3D vfio_pci_dma_buf_release, @@ -219,6 +223,7 @@ static int vfio_pci_dmabuf_export(struct vfio_pci_core_= device *vdev, return 0; } =20 +#ifdef CONFIG_VFIO_PCI_DMABUF /* * This is a temporary "private interconnect" between VFIO DMABUF and iomm= ufd. * It allows the two co-operating drivers to exchange the physical address= of @@ -428,6 +433,7 @@ int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_= device *vdev, u32 flags, kfree(dma_ranges); return ret; } +#endif /* CONFIG_VFIO_PCI_DMABUF */ =20 int vfio_pci_core_mmap_prep_dmabuf(struct vfio_pci_core_device *vdev, struct vm_area_struct *vma, @@ -490,6 +496,10 @@ void vfio_pci_dma_buf_move(struct vfio_pci_core_device= *vdev, bool revoked) struct vfio_pci_dma_buf *tmp; =20 lockdep_assert_held_write(&vdev->memory_lock); + /* + * Holding memory_lock ensures a racing VMA fault observes + * priv->revoked properly. + */ =20 list_for_each_entry_safe(priv, tmp, &vdev->dmabufs, dmabufs_elm) { if (!get_file_active(&priv->dmabuf->file)) @@ -507,6 +517,8 @@ void vfio_pci_dma_buf_move(struct vfio_pci_core_device = *vdev, bool revoked) if (revoked) { kref_put(&priv->kref, vfio_pci_dma_buf_done); wait_for_completion(&priv->comp); + unmap_mapping_range(priv->dmabuf->file->f_mapping, + 0, priv->size, 1); } else { /* * Kref is initialize again, because when revoke @@ -550,6 +562,8 @@ void vfio_pci_dma_buf_cleanup(struct vfio_pci_core_devi= ce *vdev) dma_resv_unlock(priv->dmabuf->resv); kref_put(&priv->kref, vfio_pci_dma_buf_done); wait_for_completion(&priv->comp); + unmap_mapping_range(priv->dmabuf->file->f_mapping, + 0, priv->size, 1); vfio_device_put_registration(&vdev->vdev); fput(priv->dmabuf->file); } diff --git a/drivers/vfio/pci/vfio_pci_priv.h b/drivers/vfio/pci/vfio_pci_p= riv.h index 5fd3a6e00a0e..37ece9b4b5bd 100644 --- a/drivers/vfio/pci/vfio_pci_priv.h +++ b/drivers/vfio/pci/vfio_pci_priv.h @@ -132,13 +132,13 @@ int vfio_pci_dma_buf_find_pfn(struct vfio_pci_dma_buf= *vpdmabuf, int vfio_pci_core_mmap_prep_dmabuf(struct vfio_pci_core_device *vdev, struct vm_area_struct *vma, u64 phys_start, u64 pgoff, u64 req_len); +void vfio_pci_dma_buf_cleanup(struct vfio_pci_core_device *vdev); +void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev, bool revoked= ); =20 #ifdef CONFIG_VFIO_PCI_DMABUF int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 f= lags, struct vfio_device_feature_dma_buf __user *arg, size_t argsz); -void vfio_pci_dma_buf_cleanup(struct vfio_pci_core_device *vdev); -void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev, bool revoked= ); #else static inline int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags, @@ -147,13 +147,6 @@ vfio_pci_core_feature_dma_buf(struct vfio_pci_core_dev= ice *vdev, u32 flags, { return -ENOTTY; } -static inline void vfio_pci_dma_buf_cleanup(struct vfio_pci_core_device *v= dev) -{ -} -static inline void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev, - bool revoked) -{ -} #endif =20 #endif --=20 2.47.3 From nobody Tue Apr 7 14:36:49 2026 Received: from mx0a-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 27A813F7A89; Thu, 12 Mar 2026 18:46:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=67.231.153.30 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773341222; cv=none; b=g/cAc8j+rpZAC7sFcnrPSMNLdkiFK/pSiFZxlXHemlkfFHWn8zcEzemOli1rGbYP6OeGvBItIcPSSfhpolqO3AL9TDOA4Lw9sYEg1YtBx5FQeMWLQCibQNocgHwGHP3Bqayt2haQI59N1/TABLgoDx/tMRp7R1mCq7a5/4IwjtA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773341222; c=relaxed/simple; bh=hU7/9IlWIe9sxkJKxPxFHvq+tVfjwA/O2IR+SVzpBb8=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=Q+LIlw84dUb6S/G45mIJffO703Z7fu8yqk+ugjRu0QQgUnOt2RSmKZeF9HbrZEYpNxxjStdpsbpjd28VH1pfCXe/ouCI6aJXtUm95fN1yBVUsmBI2LIIMmawxT+yoHRlV3JOH+w3OuW5KRUtgZdunzTe+JJHf6LQKEIRKOt7dTc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com; spf=pass smtp.mailfrom=meta.com; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b=Tn/GQ8s9; arc=none smtp.client-ip=67.231.153.30 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=meta.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b="Tn/GQ8s9" Received: from pps.filterd (m0089730.ppops.net [127.0.0.1]) by m0089730.ppops.net (8.18.1.11/8.18.1.11) with ESMTP id 62CHL0IH3607800; Thu, 12 Mar 2026 11:46:44 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=s2048-2025-q2; bh=nKEE9T7WtP3i7pddwSt2rehphB9a6sw4YvWdHQEL+mA=; b=Tn/GQ8s9vw4X h/tV/+MwT7qMfZlJRUMsUM4rdlmz46VY15DZca0XLiBMjoICfXmjSpyLMbeSTke+ eQSXf/aGyfp30qqN87iXUpOi2J6hRTQQKHEOCL/RzXsBf3UN3P3Esn9ozUKA6cjf B3vzUMMf9XsjJLuANByLEOZQZrQLNwMsL/ppQmC8jpqWQPRLm/6o16R3Xhw0BAQY 9IRoR66Zh92ujOx+myb88+fA6OV6Jx4N0U+hNBg3JExuYcf8VvvvT1vNG1RHCdNk 7N22u/Acpza+qlPF1HgdtW3J3ro1vJ94hu6Azm+O7J7+UZmm7klCw5qBmoTanas9 Zmd57cJYZA== Received: from mail.thefacebook.com ([163.114.134.16]) by m0089730.ppops.net (PPS) with ESMTPS id 4cv1te9t9d-6 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT); Thu, 12 Mar 2026 11:46:43 -0700 (PDT) Received: from localhost (2620:10d:c085:108::150d) by mail.thefacebook.com (2620:10d:c08b:78::c78f) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.2562.35; Thu, 12 Mar 2026 18:46:43 +0000 From: Matt Evans To: Alex Williamson , Leon Romanovsky , Jason Gunthorpe , Alex Mastro , Mahmoud Adam , David Matlack CC: =?UTF-8?q?Bj=C3=B6rn=20T=C3=B6pel?= , Sumit Semwal , =?UTF-8?q?Christian=20K=C3=B6nig?= , Kevin Tian , Ankit Agrawal , Pranjal Shrivastava , Alistair Popple , Vivek Kasireddy , , , , , Subject: [RFC v2 PATCH 06/10] vfio/pci: Remove vfio_pci_zap_bars() Date: Thu, 12 Mar 2026 11:46:04 -0700 Message-ID: <20260312184613.3710705-7-mattev@meta.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260312184613.3710705-1-mattev@meta.com> References: <20260312184613.3710705-1-mattev@meta.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwMzEyMDE1MSBTYWx0ZWRfX7EzWGssMXe7X PJNMb5cf9dkFCzFFeZxryoWwC8yUo3ePXkgGEbFkhHn1dLa48DyWe2d8Lw696N6HxKjc5AlRUUa +CHbndXP005ysohRSr+908PwQ42QCkZ+z/gT9hL5iHmt+Ghxk7Bf6cgQuIprXUqetQTxtMei8Qw ms39Fu7+XYHhMXGosRCqWHJynAFrVOVthjAfBROPtetn71IHI+e3yU1fHsebgCPfaaDpCKmWE0I /EOJgGXehxLiC2apof2s2Rsd+xXKD5JGKrzGU6+OfASwaLbOYMgr6VxuA+6A8JV+OsRakb0DxWf Ha5oVwKmZduSrFGwjixBkpeklpIigwI9aFPRWIAWFtARpDCSY2UWycm0WD9MXzVeFdmte/EXP+A ZjmZiRIkNbXQmo6etVFAK9qmn5Q9KRNEA4C3XNuWME8NQ78Bf3mNHStmxBPJxQ5NLFMpL8eSboT 1jWUdII9DO9sEY50CLg== X-Proofpoint-ORIG-GUID: 2FnSa1b1xUyjhSFio7_aR8yhzALZdgDt X-Authority-Analysis: v=2.4 cv=QoJTHFyd c=1 sm=1 tr=0 ts=69b30a14 cx=c_pps a=CB4LiSf2rd0gKozIdrpkBw==:117 a=CB4LiSf2rd0gKozIdrpkBw==:17 a=Yq5XynenixoA:10 a=VkNPw1HP01LnGYTKEx00:22 a=7x6HtfJdh03M6CCDgxCd:22 a=855S8uPTkML1Oy45N9_h:22 a=VabnemYjAAAA:8 a=o4lYeIOqc9WSjTI1-WMA:9 a=O8hF6Hzn-FEA:10 a=gKebqoRLp9LExxC7YDUY:22 X-Proofpoint-GUID: 2FnSa1b1xUyjhSFio7_aR8yhzALZdgDt X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1143,Hydra:6.1.51,FMLib:17.12.100.49 definitions=2026-03-12_02,2026-03-12_01,2025-10-01_01 Content-Type: text/plain; charset="utf-8" vfio_pci_zap_bars() and the wrapper vfio_pci_zap_and_down_write_memory_lock() are redundant as of "vfio/pci: Convert BAR mmap() to use a DMABUF". The DMABUFs used for BAR mappings already zap PTEs via the existing vfio_pci_dma_buf_move(), which notifies changes to the BAR space (e.g. around reset). Remove the old functions, and the various points needing to zap BARs become slightly cleaner. Signed-off-by: Matt Evans --- drivers/vfio/pci/vfio_pci_config.c | 18 ++++++------------ drivers/vfio/pci/vfio_pci_core.c | 30 +++++++----------------------- drivers/vfio/pci/vfio_pci_priv.h | 1 - 3 files changed, 13 insertions(+), 36 deletions(-) diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci= _config.c index b4e39253f98d..c7ed28be1104 100644 --- a/drivers/vfio/pci/vfio_pci_config.c +++ b/drivers/vfio/pci/vfio_pci_config.c @@ -590,12 +590,9 @@ static int vfio_basic_config_write(struct vfio_pci_cor= e_device *vdev, int pos, virt_mem =3D !!(le16_to_cpu(*virt_cmd) & PCI_COMMAND_MEMORY); new_mem =3D !!(new_cmd & PCI_COMMAND_MEMORY); =20 - if (!new_mem) { - vfio_pci_zap_and_down_write_memory_lock(vdev); + down_write(&vdev->memory_lock); + if (!new_mem) vfio_pci_dma_buf_move(vdev, true); - } else { - down_write(&vdev->memory_lock); - } =20 /* * If the user is writing mem/io enable (new_mem/io) and we @@ -712,12 +709,9 @@ static int __init init_pci_cap_basic_perm(struct perm_= bits *perm) static void vfio_lock_and_set_power_state(struct vfio_pci_core_device *vde= v, pci_power_t state) { - if (state >=3D PCI_D3hot) { - vfio_pci_zap_and_down_write_memory_lock(vdev); + down_write(&vdev->memory_lock); + if (state >=3D PCI_D3hot) vfio_pci_dma_buf_move(vdev, true); - } else { - down_write(&vdev->memory_lock); - } =20 vfio_pci_set_power_state(vdev, state); if (__vfio_pci_memory_enabled(vdev)) @@ -908,7 +902,7 @@ static int vfio_exp_config_write(struct vfio_pci_core_d= evice *vdev, int pos, &cap); =20 if (!ret && (cap & PCI_EXP_DEVCAP_FLR)) { - vfio_pci_zap_and_down_write_memory_lock(vdev); + down_write(&vdev->memory_lock); vfio_pci_dma_buf_move(vdev, true); pci_try_reset_function(vdev->pdev); if (__vfio_pci_memory_enabled(vdev)) @@ -993,7 +987,7 @@ static int vfio_af_config_write(struct vfio_pci_core_de= vice *vdev, int pos, &cap); =20 if (!ret && (cap & PCI_AF_CAP_FLR) && (cap & PCI_AF_CAP_TP)) { - vfio_pci_zap_and_down_write_memory_lock(vdev); + down_write(&vdev->memory_lock); vfio_pci_dma_buf_move(vdev, true); pci_try_reset_function(vdev->pdev); if (__vfio_pci_memory_enabled(vdev)) diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_c= ore.c index 41224efa58d8..9e9ad97c2f7f 100644 --- a/drivers/vfio/pci/vfio_pci_core.c +++ b/drivers/vfio/pci/vfio_pci_core.c @@ -319,7 +319,7 @@ static int vfio_pci_runtime_pm_entry(struct vfio_pci_co= re_device *vdev, * The vdev power related flags are protected with 'memory_lock' * semaphore. */ - vfio_pci_zap_and_down_write_memory_lock(vdev); + down_write(&vdev->memory_lock); vfio_pci_dma_buf_move(vdev, true); =20 if (vdev->pm_runtime_engaged) { @@ -1229,7 +1229,7 @@ static int vfio_pci_ioctl_reset(struct vfio_pci_core_= device *vdev, if (!vdev->reset_works) return -EINVAL; =20 - vfio_pci_zap_and_down_write_memory_lock(vdev); + down_write(&vdev->memory_lock); =20 /* * This function can be invoked while the power state is non-D0. If @@ -1613,22 +1613,6 @@ ssize_t vfio_pci_core_write(struct vfio_device *core= _vdev, const char __user *bu } EXPORT_SYMBOL_GPL(vfio_pci_core_write); =20 -static void vfio_pci_zap_bars(struct vfio_pci_core_device *vdev) -{ - struct vfio_device *core_vdev =3D &vdev->vdev; - loff_t start =3D VFIO_PCI_INDEX_TO_OFFSET(VFIO_PCI_BAR0_REGION_INDEX); - loff_t end =3D VFIO_PCI_INDEX_TO_OFFSET(VFIO_PCI_ROM_REGION_INDEX); - loff_t len =3D end - start; - - unmap_mapping_range(core_vdev->inode->i_mapping, start, len, true); -} - -void vfio_pci_zap_and_down_write_memory_lock(struct vfio_pci_core_device *= vdev) -{ - down_write(&vdev->memory_lock); - vfio_pci_zap_bars(vdev); -} - u16 vfio_pci_memory_lock_and_enable(struct vfio_pci_core_device *vdev) { u16 cmd; @@ -2487,10 +2471,11 @@ static int vfio_pci_dev_set_hot_reset(struct vfio_d= evice_set *dev_set, } =20 /* - * Take the memory write lock for each device and zap BAR - * mappings to prevent the user accessing the device while in - * reset. Locking multiple devices is prone to deadlock, - * runaway and unwind if we hit contention. + * Take the memory write lock for each device and + * revoke all DMABUFs, which will prevent any access + * to the device while in reset. Locking multiple + * devices is prone to deadlock, runaway and unwind if + * we hit contention. */ if (!down_write_trylock(&vdev->memory_lock)) { ret =3D -EBUSY; @@ -2498,7 +2483,6 @@ static int vfio_pci_dev_set_hot_reset(struct vfio_dev= ice_set *dev_set, } =20 vfio_pci_dma_buf_move(vdev, true); - vfio_pci_zap_bars(vdev); } =20 if (!list_entry_is_head(vdev, diff --git a/drivers/vfio/pci/vfio_pci_priv.h b/drivers/vfio/pci/vfio_pci_p= riv.h index 37ece9b4b5bd..e201c96bbb14 100644 --- a/drivers/vfio/pci/vfio_pci_priv.h +++ b/drivers/vfio/pci/vfio_pci_priv.h @@ -78,7 +78,6 @@ void vfio_config_free(struct vfio_pci_core_device *vdev); int vfio_pci_set_power_state(struct vfio_pci_core_device *vdev, pci_power_t state); =20 -void vfio_pci_zap_and_down_write_memory_lock(struct vfio_pci_core_device *= vdev); u16 vfio_pci_memory_lock_and_enable(struct vfio_pci_core_device *vdev); void vfio_pci_memory_unlock_and_restore(struct vfio_pci_core_device *vdev, u16 cmd); --=20 2.47.3 From nobody Tue Apr 7 14:36:49 2026 Received: from mx0a-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C02733FFAB7; Thu, 12 Mar 2026 18:47:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=67.231.153.30 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773341223; cv=none; b=WQK5vNQsJtL1pN1748qIGpQFEKatq3yGZmioS6rX49xk21dLggfS8fVlMWF1wNmUFfaSzDJP09Z53eG/kNgEzs09GuQlrWxSVAcFunljvM1JHEaDXlnsKMOZzVpjICDfH/p0mVrgF1zQsoULm/iqZsGAkPBk/TyciXaP34nN2TQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773341223; c=relaxed/simple; bh=GOdY4wd/HctSiuzrEenqd9IFEk1ypoMJCGQ+RTBImuI=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=ooq8ZqgKCYhdb1uUxINf2o0AmA6fgIrUqV13dIwb0V7WdJ1Dus38zaFlpcKodwBGZPSR4h5s/hdfbeQFIiFA9Hh3LuF9wWyispDBzacuHor+xG+RQDmoyYrBQbxnskHdUHYCru9X3Vy2h/9XjbMTA+PqcBxHY3KTayEfU9pCmoY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com; spf=pass smtp.mailfrom=meta.com; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b=WqHNbiOg; arc=none smtp.client-ip=67.231.153.30 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=meta.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b="WqHNbiOg" Received: from pps.filterd (m0001303.ppops.net [127.0.0.1]) by m0001303.ppops.net (8.18.1.11/8.18.1.11) with ESMTP id 62CHs6k0736979; Thu, 12 Mar 2026 11:46:47 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=s2048-2025-q2; bh=nQ0NTbM3AYdSaDAlDnUoFAR+oIqF3e53gqfw5fumC6Q=; b=WqHNbiOgtMnY KZUFTmRb3szCrdLPMjbW82ZJ4f+Yd6pTF/t0z5cyQHLp3GjRgOpqO84wNDRMAUXN sPtKykZ7Gu26/iyb+w56QmWG1kClfIHg8rINgCb8RxepzufDo33g0IvjvxFdyhFp WFXdRjuYNUgcWreJAWQuiAwYjpvOtxRDa7OhWDew7IWP4/Jpy6+bihLjle+rfcZZ S0rFxVPZNzMiC0JVem4/dqBAqN8GIIcu8MYVhr+zVkLxv1ucZ3XJHGkX+74BGDLx ob7ycZCWuTsz7ngATDO7YqmHxQitBsmKXH9d9by39AS1kngVKSkHGOJ8n7GjfujC rAg7Ud12KA== Received: from mail.thefacebook.com ([163.114.134.16]) by m0001303.ppops.net (PPS) with ESMTPS id 4cv29xs1wm-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT); Thu, 12 Mar 2026 11:46:47 -0700 (PDT) Received: from localhost (2620:10d:c085:208::7cb7) by mail.thefacebook.com (2620:10d:c08b:78::2ac9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.2562.35; Thu, 12 Mar 2026 18:46:45 +0000 From: Matt Evans To: Alex Williamson , Leon Romanovsky , Jason Gunthorpe , Alex Mastro , Mahmoud Adam , David Matlack CC: =?UTF-8?q?Bj=C3=B6rn=20T=C3=B6pel?= , Sumit Semwal , =?UTF-8?q?Christian=20K=C3=B6nig?= , Kevin Tian , Ankit Agrawal , Pranjal Shrivastava , Alistair Popple , Vivek Kasireddy , , , , , Subject: [RFC v2 PATCH 07/10] vfio/pci: Support mmap() of a VFIO DMABUF Date: Thu, 12 Mar 2026 11:46:05 -0700 Message-ID: <20260312184613.3710705-8-mattev@meta.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260312184613.3710705-1-mattev@meta.com> References: <20260312184613.3710705-1-mattev@meta.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Proofpoint-ORIG-GUID: c4yAgDi8HR127ryudnVYdfUUMGMNsyBC X-Proofpoint-GUID: c4yAgDi8HR127ryudnVYdfUUMGMNsyBC X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwMzEyMDE1MSBTYWx0ZWRfX1ghAmUdJowLL XDJiTXWxmuLG6OXMjTliHOewYUs+dUs2dL/xf3iO8wLn+b93oHtBSh9h/D6d0OIzujYv+50wBTz DhSkcbN+miFYXRmgWdhBefPEw1COEWMkfzBsjYhgn4qQRBq23gBF4CLxAaqcjyP6sxvpbuFuVbQ vqar4s6btpbLJMQZRMWQPTUr+yYcU1VqqJO7emMOUZaP+LZWM4/VQcRgf0BE5Vrc/v3cEQL0I1S QCXo4n4bllgDadEPX8MBBT0dLyk5CvAvHoYCK6N0WfBwpcgxoDLzx75SSEKCZ9Lgl8SiPiQin3Q FTZh1kfpV+kZy5/kDs30o242q5+tt1E188j4rpR6UDq4m2QqgA3b7Th7YmgaxQQFJCOYp7kVtIb AkJ49JC6ETQMtTrX9UKD9/TNDq0RdMQSbTzRrgmzsf+rMIXuTHeVeZ89eN6t8/1XOVotMLTSWQj tB6Tv+a0XeFYQqQUeKg== X-Authority-Analysis: v=2.4 cv=G4YR0tk5 c=1 sm=1 tr=0 ts=69b30a17 cx=c_pps a=CB4LiSf2rd0gKozIdrpkBw==:117 a=CB4LiSf2rd0gKozIdrpkBw==:17 a=Yq5XynenixoA:10 a=VkNPw1HP01LnGYTKEx00:22 a=7x6HtfJdh03M6CCDgxCd:22 a=_78whYxrdx1mplLwxq1U:22 a=VabnemYjAAAA:8 a=uu1nMVSziu4KC_xke-YA:9 a=gKebqoRLp9LExxC7YDUY:22 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1143,Hydra:6.1.51,FMLib:17.12.100.49 definitions=2026-03-12_02,2026-03-12_01,2025-10-01_01 Content-Type: text/plain; charset="utf-8" A VFIO DMABUF can export a subset of a BAR to userspace by fd; add support for mmap() of this fd. This provides another route for a process to map BARs, except one where the process can only map a specific subset of a BAR represented by the exported DMABUF. mmap() support enables userspace driver designs that safely delegate access to BAR sub-ranges to other client processes by sharing a DMABUF fd, without having to share the (omnipotent) VFIO device fd with them. Since the main VFIO BAR mmap() path is now DMABUF-aware, this path reuses the existing vm_ops. Signed-off-by: Matt Evans --- drivers/vfio/pci/vfio_pci_core.c | 2 +- drivers/vfio/pci/vfio_pci_dmabuf.c | 28 ++++++++++++++++++++++++++++ drivers/vfio/pci/vfio_pci_priv.h | 2 ++ 3 files changed, 31 insertions(+), 1 deletion(-) diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_c= ore.c index 9e9ad97c2f7f..4f411a0b980c 100644 --- a/drivers/vfio/pci/vfio_pci_core.c +++ b/drivers/vfio/pci/vfio_pci_core.c @@ -1712,7 +1712,7 @@ static vm_fault_t vfio_pci_mmap_page_fault(struct vm_= fault *vmf) return vfio_pci_mmap_huge_fault(vmf, 0); } =20 -static const struct vm_operations_struct vfio_pci_mmap_ops =3D { +const struct vm_operations_struct vfio_pci_mmap_ops =3D { .fault =3D vfio_pci_mmap_page_fault, #ifdef CONFIG_ARCH_SUPPORTS_HUGE_PFNMAP .huge_fault =3D vfio_pci_mmap_huge_fault, diff --git a/drivers/vfio/pci/vfio_pci_dmabuf.c b/drivers/vfio/pci/vfio_pci= _dmabuf.c index 197f50365ee1..ab665db66904 100644 --- a/drivers/vfio/pci/vfio_pci_dmabuf.c +++ b/drivers/vfio/pci/vfio_pci_dmabuf.c @@ -26,6 +26,33 @@ static int vfio_pci_dma_buf_attach(struct dma_buf *dmabu= f, =20 return 0; } + +static int vfio_pci_dma_buf_mmap(struct dma_buf *dmabuf, struct vm_area_st= ruct *vma) +{ + struct vfio_pci_dma_buf *priv =3D dmabuf->priv; + u64 req_len, req_start; + + if (priv->revoked) + return -ENODEV; + if ((vma->vm_flags & VM_SHARED) =3D=3D 0) + return -EINVAL; + + req_len =3D vma->vm_end - vma->vm_start; + req_start =3D vma->vm_pgoff << PAGE_SHIFT; + if (req_start + req_len > priv->size) + return -EINVAL; + + vma->vm_page_prot =3D pgprot_noncached(vma->vm_page_prot); + vma->vm_page_prot =3D pgprot_decrypted(vma->vm_page_prot); + + /* See comments in vfio_pci_core_mmap() re VM_ALLOW_ANY_UNCACHED. */ + vm_flags_set(vma, VM_ALLOW_ANY_UNCACHED | VM_IO | VM_PFNMAP | + VM_DONTEXPAND | VM_DONTDUMP); + vma->vm_private_data =3D priv; + vma->vm_ops =3D &vfio_pci_mmap_ops; + + return 0; +} #endif /* CONFIG_VFIO_PCI_DMABUF */ =20 static void vfio_pci_dma_buf_done(struct kref *kref) @@ -93,6 +120,7 @@ static void vfio_pci_dma_buf_release(struct dma_buf *dma= buf) static const struct dma_buf_ops vfio_pci_dmabuf_ops =3D { #ifdef CONFIG_VFIO_PCI_DMABUF .attach =3D vfio_pci_dma_buf_attach, + .mmap =3D vfio_pci_dma_buf_mmap, #endif .map_dma_buf =3D vfio_pci_dma_buf_map, .unmap_dma_buf =3D vfio_pci_dma_buf_unmap, diff --git a/drivers/vfio/pci/vfio_pci_priv.h b/drivers/vfio/pci/vfio_pci_p= riv.h index e201c96bbb14..b16a8d22563c 100644 --- a/drivers/vfio/pci/vfio_pci_priv.h +++ b/drivers/vfio/pci/vfio_pci_priv.h @@ -37,6 +37,8 @@ struct vfio_pci_dma_buf { u8 revoked : 1; }; =20 +extern const struct vm_operations_struct vfio_pci_mmap_ops; + bool vfio_pci_intx_mask(struct vfio_pci_core_device *vdev); void vfio_pci_intx_unmask(struct vfio_pci_core_device *vdev); =20 --=20 2.47.3 From nobody Tue Apr 7 14:36:49 2026 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6D2BB3FFABB; Thu, 12 Mar 2026 18:47:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=67.231.153.30 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773341225; cv=none; b=YYHZ2sfuGHG14jzOJ444t1GUvWVlt5KmkX9gihDXYK1y3kMZjndA1VifVLYgSdoyZI4so+io96ryts5cro9Y4ikseZ8JF61+LZ/QWd1hsMj7CF6hHoEYOZbvzLBdCkWMPl/dbX0T1DWjwkf/sFqWBletzob1xkrYlyRc6o+iEPA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773341225; c=relaxed/simple; bh=Utw/Es7pXLWXcN8K+dQHB/cNk/s8HBwsdfPDuj6wStA=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=O+Xr5yQz4NoqKHeQKGyTKKcOh2sJK8unkHS+OQvUSx0J15HP7E6N9vJ1/kR4sBh1S6rkLGM911app7hz/dzpIWXxcgr+8+nCirGuQ68NsRxF74oTctGes6imMME2qBW/gnMj4qHslmv4gtI8tYtk7KsBqeVTdCAitVFtBZF2gQ4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com; spf=pass smtp.mailfrom=meta.com; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b=u2tzEWYC; arc=none smtp.client-ip=67.231.153.30 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=meta.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b="u2tzEWYC" Received: from pps.filterd (m0109331.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.18.1.11/8.18.1.11) with ESMTP id 62CGlVwu2379095; Thu, 12 Mar 2026 11:46:49 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=s2048-2025-q2; bh=S9Q1m7Yawtoc96ydzMvMquqkIEeOdQ3R30MlZU1sEBY=; b=u2tzEWYCBQ0d NrjMBD7z6OjDS9pcEp0bjxszrsZbfPEzpKzEDTKyf9t8TcI+4LjeopaZRnvW/uLj ePfgZ73J8LVPgh/p5na7fEPbHX/HNInQn0aPsWlv9K0QQknKxh9wrXVJEkRm1ALj CRGeu+R3wp9ehtnCXZpww3ZyiH1wTKFPQPv7YGdVA0gKmfA8CNjEZ8gC+WWH4hrn Fx3dYAX/xQnX/pN+aO8besxYJ2zBwmzKiI/Uwm1bsO1kmleaEp8yamHJcRAbFOLF 10Ng60bzPECiaKlHwxzjNEq1FzIsOtFReokMAaXpkWo1qhiah+1UZkTDF4V84Y2d Fk8qtNOdng== Received: from maileast.thefacebook.com ([163.114.135.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 4cv1araj18-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT); Thu, 12 Mar 2026 11:46:49 -0700 (PDT) Received: from localhost (2620:10d:c0a8:fe::f072) by mail.thefacebook.com (2620:10d:c0a9:6f::8fd4) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.2562.35; Thu, 12 Mar 2026 18:46:48 +0000 From: Matt Evans To: Alex Williamson , Leon Romanovsky , Jason Gunthorpe , Alex Mastro , Mahmoud Adam , David Matlack CC: =?UTF-8?q?Bj=C3=B6rn=20T=C3=B6pel?= , Sumit Semwal , =?UTF-8?q?Christian=20K=C3=B6nig?= , Kevin Tian , Ankit Agrawal , Pranjal Shrivastava , Alistair Popple , Vivek Kasireddy , , , , , Subject: [RFC v2 PATCH 08/10] vfio/pci: Permanently revoke a DMABUF on request Date: Thu, 12 Mar 2026 11:46:06 -0700 Message-ID: <20260312184613.3710705-9-mattev@meta.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260312184613.3710705-1-mattev@meta.com> References: <20260312184613.3710705-1-mattev@meta.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Proofpoint-ORIG-GUID: l-6exdQ2mUDfMVKoUXcgg-AHq58Jx64f X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwMzEyMDE1MSBTYWx0ZWRfX59qQJdyiusot LNnaykwobaXYBrYKroTvvZzrcI5N57VMUK1Q61CVooUv3SyN3/O9tC/U1PLMVRFDmZoeQ11vbAQ M4ocF8/R+KdtA9fi1ZRLHNKDS4jXfymZSk+HwDXaY7UVPqxS/AF/nQJFbRfDCxc4Tlkk51b4yYS bnTU4/k02zchJJA9uUdmqq8RqAz1C4U3QjRtie3zDbgXoAsTt0X4v0G3sUlz0ywRTBZKFdiR5HK e6aXjTCgu0ZJb5udPCVPMF1qdB4iZkttzyXsI/bynOkCSQPm3wX7fsRbY2Di8mdIfEMzMNaTZ6L JHGjZ6PDeSriTlx5qPLaB2OFNTsdO0+7iyfArxuGX57yihDDH2qPTwzQ6uHdsPrMzgdVfAjDgpf rWXLVZmtnLJel+qDqxn03BXlKf3mYd/f1j7Y18k9WbhawIQF0AUFm/pOmTV+l+vbOZiiJ+5Gy1T P+MaGJGHAwJc2pbO9Kw== X-Proofpoint-GUID: l-6exdQ2mUDfMVKoUXcgg-AHq58Jx64f X-Authority-Analysis: v=2.4 cv=NYPrFmD4 c=1 sm=1 tr=0 ts=69b30a19 cx=c_pps a=MfjaFnPeirRr97d5FC5oHw==:117 a=MfjaFnPeirRr97d5FC5oHw==:17 a=Yq5XynenixoA:10 a=VkNPw1HP01LnGYTKEx00:22 a=7x6HtfJdh03M6CCDgxCd:22 a=wpfVPzegXHpEFt3DAXn9:22 a=VabnemYjAAAA:8 a=GvNw_QJiROG6ZKnmDkIA:9 a=gKebqoRLp9LExxC7YDUY:22 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1143,Hydra:6.1.51,FMLib:17.12.100.49 definitions=2026-03-12_02,2026-03-12_01,2025-10-01_01 Content-Type: text/plain; charset="utf-8" Expand the VFIO DMABUF revocation state to three states: Not revoked, temporarily revoked, and permanently revoked. The first two are for existing transient revocation, e.g. across a function reset, and the DMABUF is put into the last in response to a new ioctl(VFIO_DEVICE_PCI_DMABUF_REVOKE) request. This VFIO device fd ioctl passes a DMABUF by fd and requests that the DMABUF is permanently revoked. On success, it's guaranteed that the buffer can never be imported/attached/mmap()ed in future, that dynamic imports have been cleanly detached, and all mappings made inaccessible/PTEs zapped. This is useful for lifecycle management, to reclaim VFIO PCI BAR ranges previously delegated to a subordinate client process: The driver process can ensure that the loaned resources are revoked when the client is deemed "done", and exported ranges can be safely re-used elsewhere. Signed-off-by: Matt Evans --- drivers/vfio/pci/vfio_pci_core.c | 16 +++- drivers/vfio/pci/vfio_pci_dmabuf.c | 136 ++++++++++++++++++++--------- drivers/vfio/pci/vfio_pci_priv.h | 14 ++- include/uapi/linux/vfio.h | 30 +++++++ 4 files changed, 154 insertions(+), 42 deletions(-) diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_c= ore.c index 4f411a0b980c..c7760dd3a5f0 100644 --- a/drivers/vfio/pci/vfio_pci_core.c +++ b/drivers/vfio/pci/vfio_pci_core.c @@ -1461,6 +1461,18 @@ static int vfio_pci_ioctl_ioeventfd(struct vfio_pci_= core_device *vdev, ioeventfd.fd); } =20 +static int vfio_pci_ioctl_dmabuf_revoke(struct vfio_pci_core_device *vdev, + struct vfio_device_ioeventfd __user *arg) +{ + unsigned long minsz =3D offsetofend(struct vfio_pci_dmabuf_revoke, dmabuf= _fd); + struct vfio_pci_dmabuf_revoke revoke; + + if (copy_from_user(&revoke, arg, minsz)) + return -EFAULT; + + return vfio_pci_dma_buf_revoke(vdev, revoke.dmabuf_fd); +} + long vfio_pci_core_ioctl(struct vfio_device *core_vdev, unsigned int cmd, unsigned long arg) { @@ -1483,6 +1495,8 @@ long vfio_pci_core_ioctl(struct vfio_device *core_vde= v, unsigned int cmd, return vfio_pci_ioctl_reset(vdev, uarg); case VFIO_DEVICE_SET_IRQS: return vfio_pci_ioctl_set_irqs(vdev, uarg); + case VFIO_DEVICE_PCI_DMABUF_REVOKE: + return vfio_pci_ioctl_dmabuf_revoke(vdev, uarg); default: return -ENOTTY; } @@ -1690,7 +1704,7 @@ static vm_fault_t vfio_pci_mmap_huge_fault(struct vm_= fault *vmf, * change occurs whilst holding memory_lock, * so protects against racing faults. */ - if (priv->revoked) + if (priv->status !=3D VFIO_PCI_DMABUF_OK) ret =3D VM_FAULT_SIGBUS; else ret =3D vfio_pci_vmf_insert_pfn(vdev, vmf, pfn, order); diff --git a/drivers/vfio/pci/vfio_pci_dmabuf.c b/drivers/vfio/pci/vfio_pci= _dmabuf.c index ab665db66904..362207cf7e71 100644 --- a/drivers/vfio/pci/vfio_pci_dmabuf.c +++ b/drivers/vfio/pci/vfio_pci_dmabuf.c @@ -18,7 +18,7 @@ static int vfio_pci_dma_buf_attach(struct dma_buf *dmabuf, if (!attachment->peer2peer) return -EOPNOTSUPP; =20 - if (priv->revoked) + if (priv->status !=3D VFIO_PCI_DMABUF_OK) return -ENODEV; =20 if (!dma_buf_attach_revocable(attachment)) @@ -32,7 +32,7 @@ static int vfio_pci_dma_buf_mmap(struct dma_buf *dmabuf, = struct vm_area_struct * struct vfio_pci_dma_buf *priv =3D dmabuf->priv; u64 req_len, req_start; =20 - if (priv->revoked) + if (priv->status !=3D VFIO_PCI_DMABUF_OK) return -ENODEV; if ((vma->vm_flags & VM_SHARED) =3D=3D 0) return -EINVAL; @@ -72,7 +72,7 @@ vfio_pci_dma_buf_map(struct dma_buf_attachment *attachmen= t, =20 dma_resv_assert_held(priv->dmabuf->resv); =20 - if (priv->revoked) + if (priv->status !=3D VFIO_PCI_DMABUF_OK) return ERR_PTR(-ENODEV); =20 ret =3D dma_buf_phys_vec_to_sgt(attachment, priv->provider, @@ -243,7 +243,8 @@ static int vfio_pci_dmabuf_export(struct vfio_pci_core_= device *vdev, INIT_LIST_HEAD(&priv->dmabufs_elm); down_write(&vdev->memory_lock); dma_resv_lock(priv->dmabuf->resv, NULL); - priv->revoked =3D !status_ok; + priv->status =3D status_ok ? VFIO_PCI_DMABUF_OK : + VFIO_PCI_DMABUF_TEMP_REVOKED; list_add_tail(&priv->dmabufs_elm, &vdev->dmabufs); dma_resv_unlock(priv->dmabuf->resv); up_write(&vdev->memory_lock); @@ -274,7 +275,7 @@ int vfio_pci_dma_buf_iommufd_map(struct dma_buf_attachm= ent *attachment, return -EOPNOTSUPP; =20 priv =3D attachment->dmabuf->priv; - if (priv->revoked) + if (priv->status !=3D VFIO_PCI_DMABUF_OK) return -ENODEV; =20 /* More than one range to iommufd will require proper DMABUF support */ @@ -518,6 +519,48 @@ int vfio_pci_core_mmap_prep_dmabuf(struct vfio_pci_cor= e_device *vdev, return ret; } =20 +static void __vfio_pci_dma_buf_revoke(struct vfio_pci_dma_buf *priv, bool = revoked, + bool permanently) +{ + if ((priv->status =3D=3D VFIO_PCI_DMABUF_PERM_REVOKED) || + (priv->status =3D=3D VFIO_PCI_DMABUF_OK && !revoked) || + (priv->status =3D=3D VFIO_PCI_DMABUF_TEMP_REVOKED && revoked && !perm= anently)) { + return; + } + + dma_resv_lock(priv->dmabuf->resv, NULL); + if (revoked) + priv->status =3D permanently ? + VFIO_PCI_DMABUF_PERM_REVOKED : VFIO_PCI_DMABUF_TEMP_REVOKED; + dma_buf_invalidate_mappings(priv->dmabuf); + dma_resv_wait_timeout(priv->dmabuf->resv, + DMA_RESV_USAGE_BOOKKEEP, false, + MAX_SCHEDULE_TIMEOUT); + dma_resv_unlock(priv->dmabuf->resv); + if (revoked) { + kref_put(&priv->kref, vfio_pci_dma_buf_done); + wait_for_completion(&priv->comp); + unmap_mapping_range(priv->dmabuf->file->f_mapping, + 0, priv->size, 1); + } else { + /* + * Kref is initialize again, because when revoke + * was performed the reference counter was decreased + * to zero to trigger completion. + */ + kref_init(&priv->kref); + /* + * There is no need to wait as no mapping was + * performed when the previous status was + * priv->status =3D=3D *REVOKED. + */ + reinit_completion(&priv->comp); + dma_resv_lock(priv->dmabuf->resv, NULL); + priv->status =3D VFIO_PCI_DMABUF_OK; + dma_resv_unlock(priv->dmabuf->resv); + } +} + void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev, bool revoked) { struct vfio_pci_dma_buf *priv; @@ -526,45 +569,13 @@ void vfio_pci_dma_buf_move(struct vfio_pci_core_devic= e *vdev, bool revoked) lockdep_assert_held_write(&vdev->memory_lock); /* * Holding memory_lock ensures a racing VMA fault observes - * priv->revoked properly. + * priv->status properly. */ =20 list_for_each_entry_safe(priv, tmp, &vdev->dmabufs, dmabufs_elm) { if (!get_file_active(&priv->dmabuf->file)) continue; - - if (priv->revoked !=3D revoked) { - dma_resv_lock(priv->dmabuf->resv, NULL); - if (revoked) - priv->revoked =3D true; - dma_buf_invalidate_mappings(priv->dmabuf); - dma_resv_wait_timeout(priv->dmabuf->resv, - DMA_RESV_USAGE_BOOKKEEP, false, - MAX_SCHEDULE_TIMEOUT); - dma_resv_unlock(priv->dmabuf->resv); - if (revoked) { - kref_put(&priv->kref, vfio_pci_dma_buf_done); - wait_for_completion(&priv->comp); - unmap_mapping_range(priv->dmabuf->file->f_mapping, - 0, priv->size, 1); - } else { - /* - * Kref is initialize again, because when revoke - * was performed the reference counter was decreased - * to zero to trigger completion. - */ - kref_init(&priv->kref); - /* - * There is no need to wait as no mapping was - * performed when the previous status was - * priv->revoked =3D=3D true. - */ - reinit_completion(&priv->comp); - dma_resv_lock(priv->dmabuf->resv, NULL); - priv->revoked =3D false; - dma_resv_unlock(priv->dmabuf->resv); - } - } + __vfio_pci_dma_buf_revoke(priv, revoked, false); fput(priv->dmabuf->file); } } @@ -582,7 +593,7 @@ void vfio_pci_dma_buf_cleanup(struct vfio_pci_core_devi= ce *vdev) dma_resv_lock(priv->dmabuf->resv, NULL); list_del_init(&priv->dmabufs_elm); priv->vdev =3D NULL; - priv->revoked =3D true; + priv->status =3D VFIO_PCI_DMABUF_PERM_REVOKED; dma_buf_invalidate_mappings(priv->dmabuf); dma_resv_wait_timeout(priv->dmabuf->resv, DMA_RESV_USAGE_BOOKKEEP, false, @@ -597,3 +608,48 @@ void vfio_pci_dma_buf_cleanup(struct vfio_pci_core_dev= ice *vdev) } up_write(&vdev->memory_lock); } + +#ifdef CONFIG_VFIO_PCI_DMABUF +int vfio_pci_dma_buf_revoke(struct vfio_pci_core_device *vdev, int dmabuf_= fd) +{ + struct vfio_pci_core_device *db_vdev; + struct dma_buf *dmabuf; + struct vfio_pci_dma_buf *priv; + int ret =3D 0; + + dmabuf =3D dma_buf_get(dmabuf_fd); + if (IS_ERR(dmabuf)) + return PTR_ERR(dmabuf); + + /* + * The DMABUF is a user-provided fd, so sanity-check it's + * really a vfio_pci_dma_buf _and_ relates to the VFIO device + * that it was provided for: + */ + if (dmabuf->ops !=3D &vfio_pci_dmabuf_ops) { + ret =3D -ENODEV; + goto out_put_buf; + } + + priv =3D dmabuf->priv; + db_vdev =3D READ_ONCE(priv->vdev); + + if (!db_vdev || db_vdev !=3D vdev) { + ret =3D -ENODEV; + goto out_put_buf; + } + + scoped_guard(rwsem_read, &vdev->memory_lock) { + if (priv->status =3D=3D VFIO_PCI_DMABUF_PERM_REVOKED) { + ret =3D -EBADFD; + break; + } + __vfio_pci_dma_buf_revoke(priv, true, true); + } + + out_put_buf: + dma_buf_put(dmabuf); + + return ret; +} +#endif /* CONFIG_VFIO_PCI_DMABUF */ diff --git a/drivers/vfio/pci/vfio_pci_priv.h b/drivers/vfio/pci/vfio_pci_p= riv.h index b16a8d22563c..c5a9e06bf81a 100644 --- a/drivers/vfio/pci/vfio_pci_priv.h +++ b/drivers/vfio/pci/vfio_pci_priv.h @@ -23,6 +23,12 @@ struct vfio_pci_ioeventfd { bool test_mem; }; =20 +enum vfio_pci_dma_buf_status { + VFIO_PCI_DMABUF_OK =3D 0, + VFIO_PCI_DMABUF_TEMP_REVOKED =3D 1, + VFIO_PCI_DMABUF_PERM_REVOKED =3D 2, +}; + struct vfio_pci_dma_buf { struct dma_buf *dmabuf; struct vfio_pci_core_device *vdev; @@ -34,7 +40,7 @@ struct vfio_pci_dma_buf { u32 nr_ranges; struct kref kref; struct completion comp; - u8 revoked : 1; + enum vfio_pci_dma_buf_status status; }; =20 extern const struct vm_operations_struct vfio_pci_mmap_ops; @@ -140,6 +146,7 @@ void vfio_pci_dma_buf_move(struct vfio_pci_core_device = *vdev, bool revoked); int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 f= lags, struct vfio_device_feature_dma_buf __user *arg, size_t argsz); +int vfio_pci_dma_buf_revoke(struct vfio_pci_core_device *vdev, int dmabuf_= fd); #else static inline int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags, @@ -148,6 +155,11 @@ vfio_pci_core_feature_dma_buf(struct vfio_pci_core_dev= ice *vdev, u32 flags, { return -ENOTTY; } +static inline int vfio_pci_dma_buf_revoke(struct vfio_pci_core_device *vde= v, + int dmabuf_fd) +{ + return -ENODEV; +} #endif =20 #endif diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h index bb7b89330d35..c1b3fa880aa1 100644 --- a/include/uapi/linux/vfio.h +++ b/include/uapi/linux/vfio.h @@ -1307,6 +1307,36 @@ struct vfio_precopy_info { =20 #define VFIO_MIG_GET_PRECOPY_INFO _IO(VFIO_TYPE, VFIO_BASE + 21) =20 +/** + * VFIO_DEVICE_PCI_DMABUF_REVOKE - _IO(VFIO_TYPE, VFIO_BASE + 22) + * + * This ioctl is used on the device FD, and requests that access to + * the buffer corresponding to the DMABUF FD parameter is immediately + * and permanently revoked. On successful return, the buffer is not + * accessible through any mmap() or dma-buf import. The request fails + * if the buffer is pinned; otherwise, the exporter marks the buffer + * as inaccessible and uses the move_notify callback to inform + * importers of the change. The buffer is permanently disabled, and + * VFIO refuses all map, mmap, attach, etc. requests. + * + * Returns: + * + * Return: 0 on success, -1 and errno set on failure: + * + * ENODEV if the associated dmabuf FD no longer exists/is closed, + * or is not a DMABUF created for this device. + * EINVAL if the dmabuf_fd parameter isn't a DMABUF. + * EBADF if the dmabuf_fd parameter isn't a valid file number. + * EBADFD if the buffer has already been revoked. + * + */ +struct vfio_pci_dmabuf_revoke { + __u32 argsz; + __u32 dmabuf_fd; +}; + +#define VFIO_DEVICE_PCI_DMABUF_REVOKE _IO(VFIO_TYPE, VFIO_BASE + 22) + /* * Upon VFIO_DEVICE_FEATURE_SET, allow the device to be moved into a low p= ower * state with the platform-based power management. Device use of lower po= wer --=20 2.47.3 From nobody Tue Apr 7 14:36:49 2026 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 93588401A1C; Thu, 12 Mar 2026 18:47:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=67.231.153.30 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773341229; cv=none; b=K9SGkO+qyd2xPAPnpNCXMU7drVkFhWObdrsexN9GiRUsEdXd12gNSRJjm1TS3GnBTcCctBvlJ/yB/BiFYeXCrGMyUw1Tq6vGI+eOqkjHoNy0uMbcwVUGCmWly+sLmFuPjizZbxkX6mJ0Lbuwu6dhDh8Z6gM5wnE09D2R+IXz4oo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773341229; c=relaxed/simple; bh=Bacgd0qQjADZty87pzUieZ2F62uqHuBS0aNqi3JHHmg=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=IVFi2aEVW+i0bcxzmJSZ/hG8y5zo59470FPeRQNvYGm8Xey+y7NRKCMjWNJmnHVchcanSPOcuRzf43bQTMSoDuNTnXPeDhv8muNLl4irmI9OoC1FhM3VI5EGpaMFjPkUm6tAJeL5xigfkirO3whHYhczPslLWpwG4mwJLdE9gK4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com; spf=pass smtp.mailfrom=meta.com; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b=sg4ubl55; arc=none smtp.client-ip=67.231.153.30 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=meta.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b="sg4ubl55" Received: from pps.filterd (m0148460.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.18.1.11/8.18.1.11) with ESMTP id 62CHOTDS1805133; Thu, 12 Mar 2026 11:46:52 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=s2048-2025-q2; bh=giWODZ+S8k6fpZRnxvO34oUUO4s5pygsXFnhWvTEKzg=; b=sg4ubl55NazG qVUHo/c+K2tCbDe5hm6/mVJQUJrLK96pFKHPkpWIYXGNjfBVIvC5T+41fe5AWCwa zALcvvLTQ5++1L+5+nq1to5CBxitHPiOu7Bw2KQgldXpEn/Za2y4EknLJio+1hAP DdGlkta7a8O9Cx9dLgsIE/xgXbR0SAf6sQ4FJNaOKe5oIs/aingQUnoySXzXQT0C QzdmbHJ/TBMOpOcN/jZ9zZL6kf0/YOgjouEoW+6y41Cx0/tE4hUcWlIH1QEd+9od V4o5FFCBNxroiLVJYGJN+XO3IiirP25+kP5XxGDF9j+nHmxcSSY6TkhMRtX+lqWb 9kqt7osEAA== Received: from mail.thefacebook.com ([163.114.134.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 4cuytuvk6x-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT); Thu, 12 Mar 2026 11:46:52 -0700 (PDT) Received: from localhost (2620:10d:c085:208::7cb7) by mail.thefacebook.com (2620:10d:c08b:78::c78f) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.2562.35; Thu, 12 Mar 2026 18:46:51 +0000 From: Matt Evans To: Alex Williamson , Leon Romanovsky , Jason Gunthorpe , Alex Mastro , Mahmoud Adam , David Matlack CC: =?UTF-8?q?Bj=C3=B6rn=20T=C3=B6pel?= , Sumit Semwal , =?UTF-8?q?Christian=20K=C3=B6nig?= , Kevin Tian , Ankit Agrawal , Pranjal Shrivastava , Alistair Popple , Vivek Kasireddy , , , , , Subject: [RFC v2 PATCH 09/10] vfio/pci: Add mmap() attributes to DMABUF feature Date: Thu, 12 Mar 2026 11:46:07 -0700 Message-ID: <20260312184613.3710705-10-mattev@meta.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260312184613.3710705-1-mattev@meta.com> References: <20260312184613.3710705-1-mattev@meta.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Authority-Analysis: v=2.4 cv=WKtyn3sR c=1 sm=1 tr=0 ts=69b30a1c cx=c_pps a=CB4LiSf2rd0gKozIdrpkBw==:117 a=CB4LiSf2rd0gKozIdrpkBw==:17 a=Yq5XynenixoA:10 a=VkNPw1HP01LnGYTKEx00:22 a=7x6HtfJdh03M6CCDgxCd:22 a=JnKecZnUtZousrUlYMGU:22 a=VabnemYjAAAA:8 a=ZZip-zt6qTTd7FoaoeoA:9 a=gKebqoRLp9LExxC7YDUY:22 X-Proofpoint-ORIG-GUID: RFpW0x6_sBMQCOROfH6RjBKGMUM_JHHy X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwMzEyMDE1MSBTYWx0ZWRfX/r6l02q19NCa Hukc4vvWd5CNLxy6AloCnP4dOLsgPctiR2Fw/lDsGBe6oeEcGk+2q/SSH/0jZLyhsLGh363UkKp tsLATz9toKL8Sl/JfJhfH+ihnRbPDhp2na/SodIeVdIflIuhwRATy6d2uUrKkrW7GtYS3Jwi4SZ OvvOKFCC17LsxfeJjp5pZEPWONdgzVfATXmKuT5NkOLk6Y3eToNtmtCRIqkHb14b1Y7rMrLL8zC Mm/n+HZm7k2NLjWXSDVW1DhxS0hg3FvsSGgvmJzSLPHIwlsalx3wltZmYQ9srJql+Y9nc1/zdHo j6VLGND/SycL8wTKLzTQn9yvfIp0WMWBhHCvo+XxbbSjeBKzSSJl22lcySX2/f1XcS7a6I+GKFe 9Jk0oMG4ofXh6KxB9rhxmGWB5fDbdJsuEOORIGjxKFh6KXuTP0mGXTLEDRl666+TJvG45mfD73V qWJFNzbEavhU5JhZEWQ== X-Proofpoint-GUID: RFpW0x6_sBMQCOROfH6RjBKGMUM_JHHy X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1143,Hydra:6.1.51,FMLib:17.12.100.49 definitions=2026-03-12_02,2026-03-12_01,2025-10-01_01 Content-Type: text/plain; charset="utf-8" A new field is reserved in vfio_device_feature_dma_buf.flags to request CPU-facing memory type attributes for mmap()s of the buffer. Add a flag VFIO_DEVICE_FEATURE_DMA_BUF_ATTR_WC, which results in WC PTEs for the DMABUF's BAR region. Signed-off-by: Matt Evans --- drivers/vfio/pci/vfio_pci_dmabuf.c | 15 +++++++++++++-- drivers/vfio/pci/vfio_pci_priv.h | 1 + include/uapi/linux/vfio.h | 12 +++++++++--- 3 files changed, 23 insertions(+), 5 deletions(-) diff --git a/drivers/vfio/pci/vfio_pci_dmabuf.c b/drivers/vfio/pci/vfio_pci= _dmabuf.c index 362207cf7e71..ed5b80f6911e 100644 --- a/drivers/vfio/pci/vfio_pci_dmabuf.c +++ b/drivers/vfio/pci/vfio_pci_dmabuf.c @@ -42,7 +42,10 @@ static int vfio_pci_dma_buf_mmap(struct dma_buf *dmabuf,= struct vm_area_struct * if (req_start + req_len > priv->size) return -EINVAL; =20 - vma->vm_page_prot =3D pgprot_noncached(vma->vm_page_prot); + if (priv->attrs =3D=3D VFIO_DEVICE_FEATURE_DMA_BUF_ATTR_WC) + vma->vm_page_prot =3D pgprot_writecombine(vma->vm_page_prot); + else + vma->vm_page_prot =3D pgprot_noncached(vma->vm_page_prot); vma->vm_page_prot =3D pgprot_decrypted(vma->vm_page_prot); =20 /* See comments in vfio_pci_core_mmap() re VM_ALLOW_ANY_UNCACHED. */ @@ -343,6 +346,12 @@ static int validate_dmabuf_input(struct vfio_device_fe= ature_dma_buf *dma_buf, size_t length =3D 0; u32 i; =20 + if ((dma_buf->flags !=3D 0) && + ((dma_buf->flags & ~VFIO_DEVICE_FEATURE_DMA_BUF_ATTR_MASK) || + ((dma_buf->flags & VFIO_DEVICE_FEATURE_DMA_BUF_ATTR_MASK) !=3D + VFIO_DEVICE_FEATURE_DMA_BUF_ATTR_WC))) + return -EINVAL; + for (i =3D 0; i < dma_buf->nr_ranges; i++) { u64 offset =3D dma_ranges[i].offset; u64 len =3D dma_ranges[i].length; @@ -386,7 +395,7 @@ int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_= device *vdev, u32 flags, if (copy_from_user(&get_dma_buf, arg, sizeof(get_dma_buf))) return -EFAULT; =20 - if (!get_dma_buf.nr_ranges || get_dma_buf.flags) + if (!get_dma_buf.nr_ranges) return -EINVAL; =20 /* @@ -429,6 +438,7 @@ int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_= device *vdev, u32 flags, priv->vdev =3D vdev; priv->nr_ranges =3D get_dma_buf.nr_ranges; priv->size =3D length; + priv->attrs =3D get_dma_buf.flags & VFIO_DEVICE_FEATURE_DMA_BUF_ATTR_MASK; ret =3D vdev->pci_ops->get_dmabuf_phys(vdev, &priv->provider, get_dma_buf.region_index, priv->phys_vec, dma_ranges, @@ -488,6 +498,7 @@ int vfio_pci_core_mmap_prep_dmabuf(struct vfio_pci_core= _device *vdev, priv->vdev =3D vdev; priv->nr_ranges =3D nr_ranges; priv->size =3D req_len; + priv->attrs =3D 0; priv->phys_vec[0].paddr =3D phys_start + (pgoff << PAGE_SHIFT); priv->phys_vec[0].len =3D req_len; =20 diff --git a/drivers/vfio/pci/vfio_pci_priv.h b/drivers/vfio/pci/vfio_pci_p= riv.h index c5a9e06bf81a..562de3cc88f4 100644 --- a/drivers/vfio/pci/vfio_pci_priv.h +++ b/drivers/vfio/pci/vfio_pci_priv.h @@ -40,6 +40,7 @@ struct vfio_pci_dma_buf { u32 nr_ranges; struct kref kref; struct completion comp; + u32 attrs; enum vfio_pci_dma_buf_status status; }; =20 diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h index c1b3fa880aa1..fbbe1adea533 100644 --- a/include/uapi/linux/vfio.h +++ b/include/uapi/linux/vfio.h @@ -1521,7 +1521,9 @@ struct vfio_device_feature_bus_master { * etc. offset/length specify a slice of the region to create the dmabuf f= rom. * nr_ranges is the total number of (P2P DMA) ranges that comprise the dma= buf. * - * flags should be 0. + * flags contains: + * - A field for userspace mapping attribute: by default, suitable for reg= ular + * MMIO. Alternate attributes (such as WC) can be selected. * * Return: The fd number on success, -1 and errno is set on failure. */ @@ -1535,8 +1537,12 @@ struct vfio_region_dma_range { struct vfio_device_feature_dma_buf { __u32 region_index; __u32 open_flags; - __u32 flags; - __u32 nr_ranges; + __u32 flags; + /* Flags sub-field reserved for attribute enum */ +#define VFIO_DEVICE_FEATURE_DMA_BUF_ATTR_MASK (0xf << 28) +#define VFIO_DEVICE_FEATURE_DMA_BUF_ATTR_UC (0 << 28) +#define VFIO_DEVICE_FEATURE_DMA_BUF_ATTR_WC (1 << 28) + __u32 nr_ranges; struct vfio_region_dma_range dma_ranges[] __counted_by(nr_ranges); }; =20 --=20 2.47.3 From nobody Tue Apr 7 14:36:49 2026 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F274E3FF89B; Thu, 12 Mar 2026 18:47:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=67.231.145.42 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773341237; cv=none; b=J9mzepPe4NlzroHZyDP6s7s/9or0PdJZWIV4TokK9AcfOTWkcXAJvAb3WBw99V9k2a7l7cTQll1cQYur9p6lnc6NyADxC/qM+F0IdcF1c2TCYmEqRGroreAc+rAHICxIJMj3GmIkGAnRpsGJwXOU83MHiZxGjafLWL92KhHSYrE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773341237; c=relaxed/simple; bh=HNh11XFi84yYZNFFhW//pDxYt6uXPJLkOgPsLAjj9NE=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=FjAB0Gv1eiOncKeFQcmqP3zulrz0C69DnR1qSkpJ55AgaIUJTbQcMNgCdmP8Tzz/ZYUaNKr21Hqsr1Mc6J/taoGLR/N1Ll+sKConjHXFYQhO2TksmBo7jHmsXg56LrDNUWRpCqB2rBowfvQpaDjWY6MxrS9+QBjhvQOw+VSSzwM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com; spf=pass smtp.mailfrom=meta.com; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b=YCAsrIgw; arc=none smtp.client-ip=67.231.145.42 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=meta.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b="YCAsrIgw" Received: from pps.filterd (m0109333.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.18.1.11/8.18.1.11) with ESMTP id 62CHck2Y2644514; Thu, 12 Mar 2026 11:46:55 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=s2048-2025-q2; bh=6we7Ybl6OTY7SbN6Oz+gCPMPWdmByM0I9ZAh7c0Hvnc=; b=YCAsrIgwNor4 4Cfi8WfJWu1WU74BpsSel18NV9KZenk2DQyMrhYWKuz1v0EoWLtX1hXM9uAa9gXA l249IKtIp8HwumV23kIdJiDcOAtxYjdM6BGhjmEwnNAt8/KTdxDq6rFN62uR5eBP N/zcd6JBXg0O/KeKcRd/7YcTCGu6QlAX943SHTNcOGzSpCeysZMnicIWW4B1ydms qwc2r/E6ZjfuIPRgM6ud9BDpWXuakZJDLrvfg0T947Im+osmOL62Q2czbl3Va+Sc u+j0oJkVIGWt67uH3SJx/4XNSZQV+Ru+o5IaM/584i/zjKWnZKmyHf84ia5+laHp zdZJ14vS7g== Received: from mail.thefacebook.com ([163.114.134.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 4cv22s1e65-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT); Thu, 12 Mar 2026 11:46:54 -0700 (PDT) Received: from localhost (2620:10d:c085:108::4) by mail.thefacebook.com (2620:10d:c08b:78::c78f) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.2562.35; Thu, 12 Mar 2026 18:46:53 +0000 From: Matt Evans To: Alex Williamson , Leon Romanovsky , Jason Gunthorpe , Alex Mastro , Mahmoud Adam , David Matlack CC: =?UTF-8?q?Bj=C3=B6rn=20T=C3=B6pel?= , Sumit Semwal , =?UTF-8?q?Christian=20K=C3=B6nig?= , Kevin Tian , Ankit Agrawal , Pranjal Shrivastava , Alistair Popple , Vivek Kasireddy , , , , , Subject: [RFC v2 PATCH 10/10] [RFC ONLY] selftests: vfio: Add standalone vfio_dmabuf_mmap_test Date: Thu, 12 Mar 2026 11:46:08 -0700 Message-ID: <20260312184613.3710705-11-mattev@meta.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260312184613.3710705-1-mattev@meta.com> References: <20260312184613.3710705-1-mattev@meta.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Proofpoint-GUID: toqB-O_fvTEf_-E36ZywKgKMKovF47Zk X-Authority-Analysis: v=2.4 cv=batmkePB c=1 sm=1 tr=0 ts=69b30a1e cx=c_pps a=CB4LiSf2rd0gKozIdrpkBw==:117 a=CB4LiSf2rd0gKozIdrpkBw==:17 a=Yq5XynenixoA:10 a=VkNPw1HP01LnGYTKEx00:22 a=7x6HtfJdh03M6CCDgxCd:22 a=tpM8CJlwf7uhpglF1g9U:22 a=VabnemYjAAAA:8 a=jgVWRkJC2vtQ8RevbvwA:9 a=gKebqoRLp9LExxC7YDUY:22 X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwMzEyMDE1MSBTYWx0ZWRfX3ISlXPXNqRFL UoEFS6OozC9hlu5A8wBM2OQ+KWGfC2ojExBjNCzUN+smGJK7kPQgFxMI/7wpWgikeIwpYBwYH3W 9iv3eVN1Gqt67OPaoMSNSmdU8WsQUxJvc9qDSRR6mvdYoTvl6AtYkNeFRCvKasdPdEV/s3b0hXP ZKIVsFbritC+KiZ7onWMUk7j7a+y55R2COWkHWFoUvT9t+6edLJqbGxp2JhNzXHLoGg379+cj1c D9HH2tj3b5tag5guXvXe5NE0v1Z6RZR6LDyvqz4/sDElSqZV6JA1KUBlY/uP/VUu2ts7HAXYvBu OwSty4+gf2r2tdAiRKBvRcdDIi/ZYl6/CsXOcE1I36fQAgZY+8kV7B6wKzeo62wG5Sqx9r0gXad VNldG2GBdmV0VRAQwa5oJCiR1L17myl6idcRTOAPCJOAMayitruBkWCk0iUE6CzN1w+VvopuW2E YshLEVADDYAySWTZjzg== X-Proofpoint-ORIG-GUID: toqB-O_fvTEf_-E36ZywKgKMKovF47Zk X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1143,Hydra:6.1.51,FMLib:17.12.100.49 definitions=2026-03-12_02,2026-03-12_01,2025-10-01_01 Content-Type: text/plain; charset="utf-8" This test exercises VFIO DMABUF mmap() to userspace, including various revocation/shutdown cases (which make the VMA inacessible). This is a TEMPORARY test, just to illustrate a new UAPI and DMABUF/mmap() usage. Since it originates from out-of-tree code, it duplicates some of the VFIO device setup code in .../selftests/vfio/lib. Instead, the tests should be folded into the existing VFIO tests. Signed-off-by: Matt Evans --- tools/testing/selftests/vfio/Makefile | 1 + .../vfio/standalone/vfio_dmabuf_mmap_test.c | 837 ++++++++++++++++++ 2 files changed, 838 insertions(+) create mode 100644 tools/testing/selftests/vfio/standalone/vfio_dmabuf_mma= p_test.c diff --git a/tools/testing/selftests/vfio/Makefile b/tools/testing/selftest= s/vfio/Makefile index 8e90e409e91d..8679d96e5b92 100644 --- a/tools/testing/selftests/vfio/Makefile +++ b/tools/testing/selftests/vfio/Makefile @@ -12,6 +12,7 @@ TEST_GEN_PROGS +=3D vfio_iommufd_setup_test TEST_GEN_PROGS +=3D vfio_pci_device_test TEST_GEN_PROGS +=3D vfio_pci_device_init_perf_test TEST_GEN_PROGS +=3D vfio_pci_driver_test +TEST_GEN_PROGS +=3D standalone/vfio_dmabuf_mmap_test =20 TEST_FILES +=3D scripts/cleanup.sh TEST_FILES +=3D scripts/lib.sh diff --git a/tools/testing/selftests/vfio/standalone/vfio_dmabuf_mmap_test.= c b/tools/testing/selftests/vfio/standalone/vfio_dmabuf_mmap_test.c new file mode 100644 index 000000000000..0c087497b777 --- /dev/null +++ b/tools/testing/selftests/vfio/standalone/vfio_dmabuf_mmap_test.c @@ -0,0 +1,837 @@ +/* + * Tests for VFIO DMABUF userspace mmap() + * + * As well as the basics (mmap() a BAR resource to userspace), test + * shutdown/unmapping, aliasing, and DMABUF revocation scenarios. + * + * This test relies on being attached to a QEMU EDU device (for a + * simple known MMIO layout). Example invocation, assuming function + * 0000:00:03.0 is the target: + * + * # lspci -n -s 00:03.0 + * 00:03.0 00ff: 1234:11e8 (rev 10) + * + * # readlink /sys/bus/pci/devices/0000\:00\:03.0/iommu_group + * ../../../../../kernel/iommu_groups/3 + * + * (if there's a driver already attached) + * # echo 0000:00:03.0 > /sys/bus/pci/devices/0000:00:03.0/driver/unbind + * + * (and, might need) + * # echo 1 > /sys/module/vfio_iommu_type1/parameters/allow_unsafe_interr= upts + * + * Attach to VFIO: + * # echo 1234 11e8 > /sys/bus/pci/drivers/vfio-pci/new_id + * + * There should be only one thing in the group: + * # ls /sys/bus/pci/devices/0000:00:03.0/iommu_group/devices + * + * Then given above an invocation would be: + * # this_test -r 0000:00:03.0 -g 3 + * + * However, note the QEMU EDU device has a very small address span of + * useful things in BAR0, which makes testing a non-zero BAR offset + * impossible. An "extended EDU" device is supported, which just + * presents a large chunk of memory as a second BAR resource: this + * allows non-zero BAR offsets to be tested. See below for a QEMU + * diff... + * + * Copyright (c) Meta Platforms, Inc. and affiliates. + * + * This software may be used and distributed according to the terms of the + * GNU General Public License version 2. + */ + +/* +diff --git a/hw/misc/edu.c b/hw/misc/edu.c +index cece633e11..5f119e0642 100644 +--- a/hw/misc/edu.c ++++ b/hw/misc/edu.c +@@ -47,6 +47,7 @@ DECLARE_INSTANCE_CHECKER(EduState, EDU, + struct EduState { + PCIDevice pdev; + MemoryRegion mmio; ++ MemoryRegion ram; +=20 + QemuThread thread; + QemuMutex thr_mutex; +@@ -386,7 +387,12 @@ static void pci_edu_realize(PCIDevice *pdev, Error **= errp) +=20 + memory_region_init_io(&edu->mmio, OBJECT(edu), &edu_mmio_ops, edu, + "edu-mmio", 1 * MiB); ++ memory_region_init_ram(&edu->ram, OBJECT(edu), "edu-ram", 64 * MiB, &= error_fatal); + pci_register_bar(pdev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY, &edu->mmio); ++ pci_register_bar(pdev, 1, ++ PCI_BASE_ADDRESS_SPACE_MEMORY | ++ PCI_BASE_ADDRESS_MEM_PREFETCH | ++ PCI_BASE_ADDRESS_MEM_TYPE_64, &edu->ram); + } +=20 + static void pci_edu_uninit(PCIDevice *pdev) +*/ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define ROUND_UP(x, to) (((x) + (to) - 1) & ~((to) - 1)) +#define MiB(x) ((x) * 1024ULL * 1024) + +#define EDU_REG_MAGIC 0x00 +#define EDU_MAGIC_VAL 0x010000edu +#define EDU_REG_INVERT 0x04 + +#define FAIL_IF(cond, msg...) \ + do { \ + if (cond) { \ + printf("\n\nFAIL:\t"); \ + printf(msg); \ + exit(1); \ + } \ + } while (0) + +static int vfio_setup(int groupnr, char *rid_str, + struct vfio_region_info *out_mappable_regions, + int nr_regions, int *out_nr_regions, int *out_vfio_cfd, + int *out_vfio_devfd) +{ + /* Create a new container, add group to it, open device, read + * resource, reset, etc. Based on the example code in + * Documentation/driver-api/vfio.rst + */ + + int container =3D open("/dev/vfio/vfio", O_RDWR); + + int r =3D ioctl(container, VFIO_GET_API_VERSION); + + if (r !=3D VFIO_API_VERSION) { + /* Unknown API version */ + printf("-E- Unknown API ver %d\n", r); + return 1; + } + + if (ioctl(container, VFIO_CHECK_EXTENSION, VFIO_TYPE1_IOMMU) !=3D 1) { + printf("-E- Doesn't support type 1\n"); + return 1; + } + + char devpath[PATH_MAX]; + + snprintf(devpath, PATH_MAX - 1, "/dev/vfio/%d", groupnr); + /* Open the group */ + int group =3D open(devpath, O_RDWR); + + if (group < 0) { + printf("-E- Can't open VFIO device (group %d)\n", groupnr); + return 1; + } + + /* Test the group is viable and available */ + struct vfio_group_status group_status =3D { .argsz =3D sizeof( + group_status) }; + + if (ioctl(group, VFIO_GROUP_GET_STATUS, &group_status)) { + perror("-E- Can't get group status"); + return 1; + } + + if (!(group_status.flags & VFIO_GROUP_FLAGS_VIABLE)) { + /* Group is not viable (ie, not all devices bound for vfio) */ + printf("-E- Group %d is not viable!\n", groupnr); + return 1; + } + + /* Add the group to the container */ + if (ioctl(group, VFIO_GROUP_SET_CONTAINER, &container)) { + perror("-E- Can't add group to container"); + return 1; + } + + /* Enable the IOMMU model we want */ + if (ioctl(container, VFIO_SET_IOMMU, VFIO_TYPE1_IOMMU)) { + perror("-E- Can't select T1"); + return 1; + } + + /* Get addition IOMMU info */ + struct vfio_iommu_type1_info iommu_info =3D { .argsz =3D sizeof( + iommu_info) }; + + if (ioctl(container, VFIO_IOMMU_GET_INFO, &iommu_info)) { + perror("-E- Can't get VFIO info"); + return 1; + } + + /* Get a file descriptor for the device */ + int device =3D ioctl(group, VFIO_GROUP_GET_DEVICE_FD, rid_str); + + if (device < 0) { + perror("-E- Can't get device fd"); + return 1; + } + close(group); + + /* Test and setup the device */ + struct vfio_device_info device_info =3D { .argsz =3D sizeof(device_info) = }; + + if (ioctl(device, VFIO_DEVICE_GET_INFO, &device_info)) { + perror("-E- Can't get device info"); + return 1; + } + printf("-i- %d device regions, flags 0x%x\n", device_info.num_regions, + device_info.flags); + + /* Regions are BAR0-5 then ROM, config, VGA */ + int out_region =3D 0; + + for (int i =3D 0; i < device_info.num_regions; i++) { + struct vfio_region_info reg =3D { .argsz =3D sizeof(reg) }; + + reg.index =3D i; + + if (ioctl(device, VFIO_DEVICE_GET_REGION_INFO, ®)) { + /* We expect EINVAL if there's no VGA region */ + printf("-W- Region %d: ERROR %d\n", i, errno); + } else { + printf("-i- Region %d: flags 0x%08x (%c%c%c), cap_offs %d, size 0x%llx,= offs 0x%llx\n", + i, reg.flags, + (reg.flags & VFIO_REGION_INFO_FLAG_READ) ? 'R' : + '-', + (reg.flags & VFIO_REGION_INFO_FLAG_WRITE) ? 'W' : + '-', + (reg.flags & VFIO_REGION_INFO_FLAG_MMAP) ? 'M' : + '-', + reg.cap_offset, reg.size, reg.offset); + + if ((reg.flags & VFIO_REGION_INFO_FLAG_MMAP) && + (out_region < nr_regions)) + out_mappable_regions[out_region++] =3D reg; + } + } + *out_nr_regions =3D out_region; + +#ifdef THERE_ARE_NO_IRQS_YET + for (i =3D 0; i < device_info.num_irqs; i++) { + struct vfio_irq_info irq =3D { .argsz =3D sizeof(irq) }; + + irq.index =3D i; + + ioctl(device, VFIO_DEVICE_GET_IRQ_INFO, &irq); + + /* Setup IRQs... eventfds, VFIO_DEVICE_SET_IRQS */ + } +#endif + /* Gratuitous device reset and go... */ + if (ioctl(device, VFIO_DEVICE_RESET)) + perror("-W- Can't reset device (continuing)"); + + *out_vfio_cfd =3D container; + *out_vfio_devfd =3D device; + + return 0; +} + +static int vfio_feature_present(int dev_fd, uint32_t feature) +{ + struct vfio_device_feature probeftr =3D { + .argsz =3D sizeof(probeftr), + .flags =3D VFIO_DEVICE_FEATURE_PROBE | VFIO_DEVICE_FEATURE_GET | + feature, + }; + return ioctl(dev_fd, VFIO_DEVICE_FEATURE, &probeftr) =3D=3D 0; +} + +static int vfio_create_dmabuf(int dev_fd, uint32_t region, uint64_t offset, + uint64_t length) +{ + uint64_t ftrbuf + [ROUND_UP(sizeof(struct vfio_device_feature) + + sizeof(struct vfio_device_feature_dma_buf) + + sizeof(struct vfio_region_dma_range), + 8) / + 8]; + + struct vfio_device_feature *f =3D (struct vfio_device_feature *)ftrbuf; + struct vfio_device_feature_dma_buf *db =3D + (struct vfio_device_feature_dma_buf *)f->data; + struct vfio_region_dma_range *range =3D + (struct vfio_region_dma_range *)db->dma_ranges; + + f->argsz =3D sizeof(ftrbuf); + f->flags =3D VFIO_DEVICE_FEATURE_GET | VFIO_DEVICE_FEATURE_DMA_BUF; + db->region_index =3D region; + db->open_flags =3D O_RDWR | O_CLOEXEC; + db->flags =3D 0; + db->nr_ranges =3D 1; + range->offset =3D offset; + range->length =3D length; + + return ioctl(dev_fd, VFIO_DEVICE_FEATURE, &ftrbuf); +} + +/* As above, but try multiple ranges in one dmabuf */ +static int vfio_create_dmabuf_dual(int dev_fd, uint32_t region, + uint64_t offset0, uint64_t length0, + uint64_t offset1, uint64_t length1) +{ + uint64_t ftrbuf + [ROUND_UP(sizeof(struct vfio_device_feature) + + sizeof(struct vfio_device_feature_dma_buf) + + (sizeof(struct vfio_region_dma_range) * 2), + 8) / + 8]; + + struct vfio_device_feature *f =3D (struct vfio_device_feature *)ftrbuf; + struct vfio_device_feature_dma_buf *db =3D + (struct vfio_device_feature_dma_buf *)f->data; + struct vfio_region_dma_range *range =3D + (struct vfio_region_dma_range *)db->dma_ranges; + + f->argsz =3D sizeof(ftrbuf); + f->flags =3D VFIO_DEVICE_FEATURE_GET | VFIO_DEVICE_FEATURE_DMA_BUF; + db->region_index =3D region; + db->open_flags =3D O_RDWR | O_CLOEXEC; + db->flags =3D 0; + db->nr_ranges =3D 2; + range[0].offset =3D offset0; + range[0].length =3D length0; + range[1].offset =3D offset1; + range[1].length =3D length1; + + return ioctl(dev_fd, VFIO_DEVICE_FEATURE, &ftrbuf); +} + +static volatile uint32_t *mmap_resource_aligned(size_t size, + unsigned long align, int fd, + unsigned long offset) +{ + void *v; + + if (align <=3D getpagesize()) { + v =3D mmap(0, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, + offset); + FAIL_IF(v =3D=3D MAP_FAILED, + "Can't mmap fd %d (size 0x%lx, offset 0x%lx), %d\n", fd, + size, offset, errno); + } else { + size_t resv_size =3D size + align; + void *resv =3D + mmap(0, resv_size, 0, MAP_PRIVATE | MAP_ANON, -1, 0); + FAIL_IF(resv =3D=3D MAP_FAILED, + "Can't mmap reservation, size 0x%lx, %d\n", resv_size, + errno); + + uintptr_t pos =3D ((uintptr_t)resv + (align - 1)) & ~(align - 1); + + v =3D mmap((void *)pos, size, PROT_READ | PROT_WRITE, + MAP_SHARED | MAP_FIXED, fd, offset); + FAIL_IF(v =3D=3D MAP_FAILED, + "Can't mmap-fixed fd %d (size 0x%lx, offset 0x%lx), %d\n", + fd, size, offset, errno); + madvise((void *)v, size, MADV_HUGEPAGE); + + /* Tidy */ + if (pos > (uintptr_t)resv) + munmap(resv, pos - (uintptr_t)resv); + if (pos + size < (uintptr_t)resv + resv_size) + munmap((void *)pos + size, + (uintptr_t)resv + resv_size - (pos + size)); + } + + return (volatile uint32_t *)v; +} + +static volatile uint32_t *mmap_resource(size_t size, int fd, + unsigned long offset) +{ + return mmap_resource_aligned(size, getpagesize(), fd, offset); +} + +static void check_mmio(volatile uint32_t *base) +{ + static uint32_t magic =3D 0xdeadbeef; + uint32_t v; + + printf("-i- MMIO check: "); + + /* Trivial MMIO */ + v =3D base[EDU_REG_MAGIC / 4]; + FAIL_IF(v !=3D EDU_MAGIC_VAL, + "Magic value %08x incorrect, BAR map bad?\n", v); + + base[EDU_REG_INVERT / 4] =3D magic; + v =3D base[EDU_REG_INVERT / 4]; + FAIL_IF(v !=3D ~magic, "Inverterizer value %08x bad (should be %08x)\n", + v, ~magic); + printf("OK\n"); + + magic =3D (magic << 1) ^ (magic >> 1) ^ (magic << 7); +} + +static int revoke_dmabuf(int dev_fd, int dmabuf_fd) +{ + struct vfio_pci_dmabuf_revoke dmabuf_rev =3D { + .argsz =3D sizeof(dmabuf_rev), + .dmabuf_fd =3D dmabuf_fd, + }; + return ioctl(dev_fd, VFIO_DEVICE_PCI_DMABUF_REVOKE, &dmabuf_rev); +} + +static jmp_buf jmpbuf; + +static void sighandler(int sig) +{ + printf("*** Signal %d ***\n", sig); + siglongjmp(jmpbuf, sig); +} + +static void setup_signals(void) +{ + struct sigaction sa =3D { + .sa_handler =3D sighandler, + .sa_flags =3D 0, + }; + + sigaction(SIGBUS, &sa, NULL); +} + +static int vfio_dmabuf_test(int groupnr, char *rid_str) +{ + /* Only expecting one or two regions */ + struct vfio_region_info bar_region[2]; + int num_regions =3D 0; + int container_fd, dev_fd; + int r =3D vfio_setup(groupnr, rid_str, &bar_region[0], 2, &num_regions, + &container_fd, &dev_fd); + + FAIL_IF(r, "VFIO setup failed\n"); + FAIL_IF(!vfio_feature_present(dev_fd, VFIO_DEVICE_FEATURE_DMA_BUF), + "VFIO DMABUF support not available\n"); + + printf("-i- Container fd %d, device fd %d, and got DMA_BUF\n", + container_fd, dev_fd); + + setup_signals(); + + /////////////////////////////////////////////////////////////////////////= /////// + + /* Real basics: create DMABUF, and mmap it, and access MMIO through it. + * Do this for 2nd BAR if present, too (just plain memory). + */ + printf("\nTEST: Create DMABUF, map it\n"); + int bar_db_fd =3D vfio_create_dmabuf(dev_fd, /* region */ 0, + /* offset */ 0, bar_region[0].size); + FAIL_IF(bar_db_fd < 0, "Can't create DMABUF, %d\n", errno); + + volatile uint32_t *dbbar0 =3D + mmap_resource(bar_region[0].size, bar_db_fd, 0); + + printf("-i- Mapped DMABUF BAR0 at %p+0x%llx\n", dbbar0, + bar_region[0].size); + check_mmio(dbbar0); + + /* TEST: Map the traditional VFIO one _second_; it should still work. */ + printf("\nTEST: Map the regular VFIO BAR\n"); + volatile uint32_t *vfiobar =3D + mmap_resource(bar_region[0].size, dev_fd, bar_region[0].offset); + + printf("-i- Mapped VIRTIO BAR0 at %p+0x%llx\n", vfiobar, + bar_region[0].size); + check_mmio(vfiobar); + + /* Test plan: + * + * - Revoke the first DMABUF, check for fault + * - Check VFIO BAR access still works + * - Revoke first DMABUF fd again: -EBADFD + * - create new DMABUF for same (previously-revoked) region: accessible + * + * - Create overlapping DMABUFs: map success, maps alias OK + * - Create a second mapping of the second DMABUF, maps alias OK + * - Destroy one by revoking through a dup()ed fd: check mapping revoked + * - Check original is still accessible + * + * If we have a larger (>4K of accessible stuff!) second BAR resource: + * - Map it, create an overlapping alias with offset !=3D 0 + * - Check alias/offset is sane + * + * Last: + * - close container_fd and dev_fd: check DMABUF mapping revoked + * - try revoking a non-DMABUF fd: -EINVAL + */ + + printf("\nTEST: Revocation of first DMABUF\n"); + r =3D revoke_dmabuf(dev_fd, bar_db_fd); + FAIL_IF(r !=3D 0, "Can't revoke: %d\n", errno); + + if (sigsetjmp(jmpbuf, 1) =3D=3D 0) { + // Try an access: expect BOOM + check_mmio(dbbar0); + FAIL_IF(true, "Expecting fault after revoke!\n"); + } + printf("-i- Revoked OK\n"); + + printf("\nTEST: Access through VFIO-mapped region still works\n"); + if (sigsetjmp(jmpbuf, 1) =3D=3D 0) + check_mmio(vfiobar); + else + FAIL_IF(true, "Expecting VFIO-mapped BAR to still work!\n"); + + printf("\nTEST: Double-revoke\n"); + r =3D revoke_dmabuf(dev_fd, bar_db_fd); + FAIL_IF(r !=3D -1 || errno !=3D EBADFD, + "Expecting 2nd revoke to give EBADFD, got %d errno %d\n", r, + errno); + printf("-i- Correctly failed second revoke\n"); + + printf("\nTEST: Can't mmap() revoked DMABUF\n"); + void *dbfail =3D mmap(0, bar_region[1].size, PROT_READ | PROT_WRITE, + MAP_SHARED, bar_db_fd, 0); + FAIL_IF(dbfail !=3D MAP_FAILED, "mmap() should fail\n"); + printf("-i- OK\n"); + + printf("\nTEST: Recreate new DMABUF for previously-revoked region\n"); + int bar_db_fd_2 =3D vfio_create_dmabuf( + dev_fd, /* region */ 0, /* offset */ 0, bar_region[0].size); + FAIL_IF(bar_db_fd_2 < 0, "Can't create DMABUF, %d\n", errno); + + volatile uint32_t *dbbar0_2 =3D + mmap_resource(bar_region[0].size, bar_db_fd_2, 0); + + printf("-i- Mapped 2nd DMABUF BAR0 at %p+0x%llx\n", dbbar0_2, + bar_region[0].size); + check_mmio(dbbar0_2); + + munmap((void *)dbbar0, bar_region[0].size); + close(bar_db_fd); + + printf("\nTEST: Create aliasing/overlapping DMABUF\n"); + int bar_db_fd_3 =3D vfio_create_dmabuf( + dev_fd, /* region */ 0, /* offset */ 0, bar_region[0].size); + FAIL_IF(bar_db_fd_3 < 0, "Can't create DMABUF, %d\n", errno); + + volatile uint32_t *dbbar0_3 =3D + mmap_resource(bar_region[0].size, bar_db_fd_3, 0); + + printf("-i- Mapped 3rd DMABUF BAR0 at %p+0x%llx\n", dbbar0_3, + bar_region[0].size); + check_mmio(dbbar0_3); + + /* Basic aliasing check: Write value through 2nd, read back through 3rd */ + uint32_t v; + + dbbar0_2[EDU_REG_INVERT / 4] =3D 0xfacecace; + v =3D dbbar0_3[EDU_REG_INVERT / 4]; + FAIL_IF(v !=3D ~0xfacecace, + "Alias inverted MMIO value %08x bad (should be %08x)\n", v, + ~0xfacecace); + printf("-i- Aliasing DMABUF OK\n"); + + printf("\nTEST: Create a double-mapping of DMABUF\n"); + /* Create another mmap of the existing aliasing DMABUF fd */ + volatile uint32_t *dbbar0_3_2 =3D + mmap_resource(bar_region[0].size, bar_db_fd_3, 0); + + printf("-i- Mapped 3rd DMABUF BAR0 _again_ at %p+0x%llx\n", dbbar0_3_2, + bar_region[0].size); + /* Can we see the value we wrote before? */ + v =3D dbbar0_3_2[EDU_REG_INVERT / 4]; + FAIL_IF(v !=3D ~0xfacecace, + "Alias alias inverted MMIO value %08x bad (should be %08x)\n", + v, ~0xfacecace); + check_mmio(dbbar0_3_2); + + printf("\nTEST: revoke aliasing DMABUF through dup()ed fd\n"); + int dup_dbfd3 =3D dup(bar_db_fd_3); + + r =3D revoke_dmabuf(dev_fd, dup_dbfd3); + FAIL_IF(r !=3D 0, "Can't revoke: %d\n", errno); + + /* Both of the mmap()s made should now be gone */ + if (sigsetjmp(jmpbuf, 1) =3D=3D 0) { + check_mmio(dbbar0_3); + FAIL_IF(true, "Expecting fault on 1st mmap after revoke!\n"); + } + + if (sigsetjmp(jmpbuf, 1) =3D=3D 0) { + check_mmio(dbbar0_3_2); + FAIL_IF(true, "Expecting fault on 2nd mmap after revoke!\n"); + } + printf("-i- Both aliasing DMABUF mappings revoked OK\n"); + + close(dup_dbfd3); + close(bar_db_fd_3); + munmap((void *)dbbar0_3, bar_region[0].size); + munmap((void *)dbbar0_3_2, bar_region[0].size); + + /* And finally, although the aliasing DMABUF is gone, access + * through the original one should still work: + */ + if (sigsetjmp(jmpbuf, 1) =3D=3D 0) + check_mmio(dbbar0_2); + else + FAIL_IF(true, + "Expecting original DMABUF mapping to still work!\n"); + printf("-i- Aliasing DMABUF removal OK, original still accessible\n"); + + /* If we're attached to a hacked/extended QEMU EDU device with + * a large memory region 1 then we can test things like + * offsets/aliasing. + */ + if (num_regions >=3D 2) { + printf("\nTEST: Second BAR: test overlapping+offset DMABUF\n"); + + printf("-i- Region 1 DMABUF: offset %llx, size %llx\n", + bar_region[1].offset, bar_region[1].size); + int bar1_db_fd =3D + vfio_create_dmabuf(dev_fd, 1, 0, bar_region[1].size); + + FAIL_IF(bar1_db_fd < 0, "Can't create DMABUF, %d\n", errno); + + volatile uint32_t *dbbar1 =3D mmap_resource_aligned( + bar_region[1].size, MiB(32), bar1_db_fd, 0); + printf("-i- Mapped DMABUF Region 1 at %p+0x%llx\n", dbbar1, + bar_region[1].size); + + /* Init with known values */ + for (unsigned long i =3D 0; i < (bar_region[1].size); + i +=3D getpagesize()) + dbbar1[i / 4] =3D 0xca77face ^ i; + + v =3D dbbar1[0]; + FAIL_IF(v !=3D 0xca77face, + "DB Region 1 read: Magic value %08x incorrect\n", v); + printf("-i- DB Region 1 read: Magic: 0x%08x\n", v); + + /* TEST: Overlap/aliasing; map same BAR with a range + * offset > 0. Also test disjoint/multi-range DMABUFs + * by creating a second range. This appears as one + * contiguous VA range mapped to a first BAR range + * (starting from range0_offset), then skipping a few + * physical pages, then a second range (starting at + * range1_offset). + */ + unsigned long range0_offset =3D getpagesize() * 3; + unsigned long range1_skip_pages =3D 5; + unsigned long range1_skip =3D getpagesize() * range1_skip_pages; + unsigned long range_size =3D + (bar_region[1].size - range0_offset - range1_skip) / 2; + unsigned long range1_offset =3D + range0_offset + range_size + range1_skip; + unsigned long map_size =3D range_size * 2; + + printf("\nTEST: Second BAR aliasing mapping, two ranges size 0x%lx:\n\t\= t0x%lx-0x%lx, 0x%lx-0x%lx\n", + range_size, range0_offset, range0_offset + range_size, + range1_offset, range1_offset + range_size); + + int bar1_2_db_fd =3D vfio_create_dmabuf_dual( + dev_fd, 1, range0_offset, range_size, range1_offset, + range_size); + FAIL_IF(bar1_2_db_fd < 0, "Can't create DMABUF, %d\n", errno); + + volatile uint32_t *dbbar1_2 =3D + mmap_resource(map_size, bar1_2_db_fd, 0); + + printf("-i- Mapped DMABUF Region 1 alias at %p+0x%lx\n", + dbbar1_2, map_size); + FAIL_IF(dbbar1_2[0] !=3D dbbar1[range0_offset / 4], + "slice2 value mismatch\n"); + + dbbar1[(range0_offset + 4) / 4] =3D 0xfacef00d; + /* Check we can see the value written above at +offset + * from offset 0 of this mapping (since the DMABUF + * itself is offsetted): + */ + v =3D dbbar1_2[4 / 4]; + FAIL_IF(v !=3D 0xfacef00d, + "DB Region 1 alias read: Magic value %08x incorrect\n", + v); + printf("-i- DB Region 1 alias read: Magic 0x%08x, OK\n", v); + + /* Read back the known values across the two + * sub-ranges of the dbbar1_2 mapping, accounting for + * the physical pages skipped between them + */ + for (unsigned long i =3D 0; i < range_size; i +=3D getpagesize()) { + unsigned long t =3D i + range0_offset; + uint32_t want =3D (0xca77face ^ t); + + v =3D dbbar1_2[i / 4]; + FAIL_IF(v !=3D want, + "Expected %08x (got %08x) from range0 +%08lx (real %08lx)\n", + want, v, i, t); + } + for (unsigned long i =3D range_size; i < (range_size * 2); + i +=3D getpagesize()) { + unsigned long t =3D i + range1_offset - range_size; + uint32_t want =3D (0xca77face ^ t); + + v =3D dbbar1_2[i / 4]; + FAIL_IF(v !=3D want, + "Expected %08x (got %08x) from range1 +%08lx (real %08lx)\n", + want, v, i, t); + } + + printf("\nTEST: Third BAR aliasing mapping, testing mmap() non-zero offs= et:\n"); + + unsigned long smaller =3D range_size - 0x1000; + volatile uint32_t *dbbar1_3 =3D mmap_resource_aligned( + smaller, MiB(32), bar1_2_db_fd, range_size); + printf("-i- Mapped DMABUF Region 1 range 1 alias at %p+0x%lx\n", + dbbar1_3, smaller); + + for (unsigned long i =3D 0; i < smaller; i +=3D getpagesize()) { + unsigned long t =3D i + range1_offset; + uint32_t want =3D (0xca77face ^ t); + + v =3D dbbar1_3[i / 4]; + FAIL_IF(v !=3D want, + "Expected %08x (got %08x) from 3rd range1 +%08lx (real %08lx)\n", + want, v, i, t); + } + printf("-i- mmap offset OK\n"); + + /* TODO: If we can observe hugepages (mechanically, + * rather than human reading debug), we can test + * interesting alignment cases for the PFN search: + * + * - Deny hugepages at start/end of an mmap() that + * starts/ends at non-HP-aligned addresses + * (e.g. first pages are small, middle is fully + * aligned in VA and PFN so 2M, and buffer finishes + * before 2M boundary, so last pages are small). + * + * - Everything aligned nicely except the mmap() size + * is <2MB, so hugepage denied due to straddling + * end. + * + * - Buffer offsets into BAR not aligned, so no huge + * mappings even if mmap() is perfectly aligned. + */ + + /* Check that access after DMABUF fd close still works + * (VMA still holds refcount, obvs!) + */ + close(bar1_2_db_fd); + if (sigsetjmp(jmpbuf, 1) =3D=3D 0) + v =3D dbbar1_2[0x4 / 4]; + else + FAIL_IF(true, + "Expecting original DMABUF mapping to still work!\n"); + printf("-i- DB Region 1 alias read 2: Magic 0x%08x, OK\n", v); + printf("-i- Offset check OK\n"); + } + + printf("\nTEST: Shutdown: close VFIO container/device fds, check DMABUF g= one\n"); + + /* Final use of dev_fd: use it to try to revoke a non-DMABUF fd: */ + r =3D revoke_dmabuf(dev_fd, 1); + FAIL_IF(r !=3D -1 || errno !=3D EINVAL, + "Expecting revoke of stdout to give EINVAL, got %d errno %d\n", + r, errno); + printf("-i- Correctly failed final revoke\n"); + + /* Closing all uses of dev_fd (including the VFIO BAR mmap()!) + * will revoke the DMABUF; even though the DMABUF fd might + * remain open, the mapping itself is zapped. Start with a + * plain close (before unmapping the VFIO BAR mapping): + */ + close(dev_fd); + close(container_fd); + printf("-i- VFIO fds closed\n"); + + if (sigsetjmp(jmpbuf, 1) =3D=3D 0) + check_mmio(dbbar0_2); + else + FAIL_IF(true, + "Expecting DMABUF mapping to still work if VFIO mapping still live!\n"); + + if (sigsetjmp(jmpbuf, 1) =3D=3D 0) + check_mmio(vfiobar); + else + FAIL_IF(true, + "Expecting VFIO BAR mapping to still work after fd close!\n"); + + munmap((void *)vfiobar, bar_region[0].size); + printf("-i- VFIO BAR unmapped\n"); + + /* The final reference via VFIO should now be gone, and the + * DMABUF should now be destroyed. The mapping of it should + * be inaccessible: + */ + if (sigsetjmp(jmpbuf, 1) =3D=3D 0) { + check_mmio(dbbar0_2); + FAIL_IF(true, + "Expecting DMABUF mapping to fault after VFIO fd shutdown!\n"); + } + printf("-i- DMABUF mappings inaccessible\n"); + + /* Ensure we can't mmap() DMABUF for closed device */ + void *dbfail2 =3D mmap(0, bar_region[1].size, PROT_READ | PROT_WRITE, + MAP_SHARED, bar_db_fd_2, 0); + FAIL_IF(dbfail2 !=3D MAP_FAILED, "mmap() should fail\n"); + printf("-i- Can't mmap DMABUF for closed device, OK\n"); + + munmap((void *)dbbar0_2, bar_region[0].size); + close(bar_db_fd_2); + + printf("\nPASS\n"); + + return 0; +} + +static void usage(char *me) +{ + printf("Usage:\t%s -g -r \n" + "\n" + "\t\tGroup is found via device path, e.g. cat /sys/bus/pci/devices= /0000:03:1d.0/iommu_group\n" + "\t\tRID is of the form 0000:03:1d.0\n" + "\n", + me); +} + +int main(int argc, char *argv[]) +{ + /* Get args: IOMMU group and BDF/path */ + int groupnr =3D -1; + char *rid_str =3D NULL; + int arg; + + while ((arg =3D getopt(argc, argv, "g:r:h")) !=3D -1) { + switch (arg) { + case 'g': + groupnr =3D atoi(optarg); + break; + + case 'r': + rid_str =3D strdup(optarg); + break; + case 'h': + default: + usage(argv[0]); + return 1; + } + } + + if (rid_str =3D=3D NULL || groupnr =3D=3D -1) { + usage(argv[0]); + return 1; + } + + printf("-i- Using group number %d, RID '%s'\n", groupnr, rid_str); + + return vfio_dmabuf_test(groupnr, rid_str); +} --=20 2.47.3