From nobody Tue Apr 7 17:13:28 2026 Received: from mx0a-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 534454657F1; Thu, 26 Feb 2026 20:22:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=67.231.153.30 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772137380; cv=none; b=dEeJ4NNVRd3pY6A6YOj3YpqBPEBSpBeDNn7DC2Z8O4KHUoxb+91tLlKacrpbq9xHR+TeYkHxCa9qJFPxaJq86PxnZtFKrW4zJbJm80hVYCfhej+/n9mMFdZ7q9vSrQV6hEwu1Llzqe7/8v66koyi4vlZvZeVFLTSLvuA3R5WlqA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772137380; c=relaxed/simple; bh=MlcS4+vFl+7wEZk4is3XZG96cVNjikLJ4HsQTnQbgFE=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=pqOYNhwmiKnhz0RekOooNj8bLcX4DfgjhAsgEn6pN/UPYQnQQsw8MNNgP1iwfHTjrgbiOkwz2KRbSYBR6+pkGrxyU7xDNnR7T6WUCvq1JReTpjU3mTtdDwJ8HppFVeyVmJfmYp0t0b5V0q27ZpKnSkBHZKSY0bO0+sXOLkNf/J4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com; spf=pass smtp.mailfrom=meta.com; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b=Ro+ZAfv3; arc=none smtp.client-ip=67.231.153.30 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=meta.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b="Ro+ZAfv3" Received: from pps.filterd (m0089730.ppops.net [127.0.0.1]) by m0089730.ppops.net (8.18.1.11/8.18.1.11) with ESMTP id 61QHgUEc1069359; Thu, 26 Feb 2026 12:22:45 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=s2048-2025-q2; bh=l6eVE4IIlQ2xaNQUCFG/n3h8OW7Ucx8dDtjDv6E+Jjs=; b=Ro+ZAfv3aURa otkB5+Xl4tOzfJLXIrfoK35UVPYme0qKfjHf3pM/tPaVVMBdNKIRJSxCPkfwwfhv 9jiU+GP4IHqH6uMUPDODFXpmLWSWIyVrZlI69Kl2wohCL+S2QxOIpSXWMzjJ6SGW Lvy42qbnemzbo6LHc2amIBfl43NKaj8WLSsyW2Wl+BlpBG7rWHExCv0Pax1KI5nS 9Q7Ic1N8pLFsmeeGHbTbWR1AnmSUKRrG0bPemC03HAyA8rSlCthUCusNH6SLXAkZ fsxfZgBVz8I01VwFbbiNAJtjYo/P8a3uR1cGGHLEYV5zPJkr1tud1HIstc7SyYrx 5jVt7c2rAg== Received: from mail.thefacebook.com ([163.114.134.16]) by m0089730.ppops.net (PPS) with ESMTPS id 4cjnjmd8xh-2 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT); Thu, 26 Feb 2026 12:22:45 -0800 (PST) Received: from localhost (2620:10d:c085:208::7cb7) by mail.thefacebook.com (2620:10d:c08b:78::2ac9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.2562.35; Thu, 26 Feb 2026 20:22:43 +0000 From: Matt Evans To: Alex Williamson , Leon Romanovsky , Jason Gunthorpe , Alex Mastro , Mahmoud Adam , David Matlack CC: =?UTF-8?q?Bj=C3=B6rn=20T=C3=B6pel?= , Sumit Semwal , =?UTF-8?q?Christian=20K=C3=B6nig?= , Kevin Tian , Ankit Agrawal , Pranjal Shrivastava , Alistair Popple , Vivek Kasireddy , , , , , Subject: [RFC PATCH 1/7] vfio/pci: Ensure VFIO barmap is set up before creating a DMABUF Date: Thu, 26 Feb 2026 12:21:57 -0800 Message-ID: <20260226202211.929005-2-mattev@meta.com> X-Mailer: git-send-email 2.47.3 In-Reply-To: <20260226202211.929005-1-mattev@meta.com> References: <20260226202211.929005-1-mattev@meta.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwMjI2MDE4NiBTYWx0ZWRfX05FulneI4bIg +7g+ADgjmslrOy3Y6zyvkpP4uLK92e2XPCJoDVOnXju0UdbaphcqJJQmTBr9lgy3LxvR8R3vCqw y94aRZUl5oheUkI6Tujr4n0Sd6Tv5R4PjlAYtiKI9nb8GLdMJWSU2++5gwcVWSkskdKafOx4mLC 8//zuPVN2h2x6GcSViKmNXhWaSM1Fou7juakmnNJad2qMhA2tJRs3zKyvFvC1ZXNEeSiatXpD5F WKXD/Wa7Ay8n/tk803hAgxBdDvzcivwbdZiwWFswENNPYe103iklgQbecmTHhEnCRVDIYaC4EII wqjP2ei0KRHJefv0+qchDcr0q/CvVivN2NS/l8QEWZgEDyIfhhCN/uDn5uu4I5HqDDlopth5U3I 7QfqDH1IVxZib83QyQJO+BWPmhVwqoUyq2KvP1Wu9CNpTM2huXCKTXKmBVK3AjNne+N5O2/0saj gKHVaysBOdm+c6SOw8A== X-Proofpoint-ORIG-GUID: AUPBuU8COn3qwoYaqyntQk0wsBNevAIo X-Authority-Analysis: v=2.4 cv=B/m0EetM c=1 sm=1 tr=0 ts=69a0ab95 cx=c_pps a=CB4LiSf2rd0gKozIdrpkBw==:117 a=CB4LiSf2rd0gKozIdrpkBw==:17 a=TqTIxcMBvpabANa2:21 a=HzLeVaNsDn8A:10 a=VkNPw1HP01LnGYTKEx00:22 a=Mpw57Om8IfrbqaoTuvik:22 a=GgsMoib0sEa3-_RKJdDe:22 a=VabnemYjAAAA:8 a=2V9JFKNtuBlljh9624AA:9 a=gKebqoRLp9LExxC7YDUY:22 X-Proofpoint-GUID: AUPBuU8COn3qwoYaqyntQk0wsBNevAIo X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.51,FMLib:17.12.100.49 definitions=2026-02-26_02,2026-02-26_01,2025-10-01_01 Content-Type: text/plain; charset="utf-8" A DMABUF exports access to BAR resources which need to be requested before the DMABUF is handed out. Usually the resources are requested when setting up the barmap when the VFIO device fd is mmap()ed, but there's no guarantee that happens before a DMABUF is created. Set up the barmap (and so request resources) in the DMABUF-creation path. Fixes: 5d74781ebc86c ("vfio/pci: Add dma-buf export support for MMIO region= s") Signed-off-by: Matt Evans --- drivers/vfio/pci/vfio_pci_dmabuf.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/drivers/vfio/pci/vfio_pci_dmabuf.c b/drivers/vfio/pci/vfio_pci= _dmabuf.c index 4be4a85005cb..46ab64fbeb19 100644 --- a/drivers/vfio/pci/vfio_pci_dmabuf.c +++ b/drivers/vfio/pci/vfio_pci_dmabuf.c @@ -258,6 +258,17 @@ int vfio_pci_core_feature_dma_buf(struct vfio_pci_core= _device *vdev, u32 flags, goto err_free_priv; } =20 + /* + * Just like the vfio_pci_core_mmap() path, we need to ensure + * PCI regions have been requested before returning DMABUFs + * that reference them. It's possible to create a DMABUF for + * a BAR without the BAR having already been mmap()ed. The + * barmap setup requests the regions for us: + */ + ret =3D vfio_pci_core_setup_barmap(vdev, get_dma_buf.region_index); + if (ret) + goto err_free_phys; + priv->vdev =3D vdev; priv->nr_ranges =3D get_dma_buf.nr_ranges; priv->size =3D length; --=20 2.47.3 From nobody Tue Apr 7 17:13:28 2026 Received: from mx0a-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B88AE466B7E; Thu, 26 Feb 2026 20:22:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=67.231.153.30 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772137380; cv=none; b=ugSWk4LNui8DVfnHRdXOtG37XL8hMiT/8zJoJtUv2dCyJYVbLmboHa+h2LlQnKonMMs+BvgnvpBiOnMj8ICblvaMbsTGiNDpyf41JBDkLpuRJPwAXcCp0nx+scr2ulpMk2iDFISqmeM59XgBLXsoiW3Jn3cO0mqPFyDeaC1z41U= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772137380; c=relaxed/simple; bh=FXtJ8uXbYjqLfElClLaW+FTDD2+Ilt7t5tsXROC01Dw=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=WjKnE5MSpiU1q5ka+qCaN/UmPJp8+E1tQpddSZFHR/oXjmkgt8JW7+rtHMEaWqmew4C4P6Do5R5VZUejm0nlvU9Fu0QZT/fanpA6UwGnMWLAR+dK5g8BCFHuwbbAK9eNbqxR94HcEDzh2Lx4BDpDz5dtsTfkMOvvhcJ/SR0gyms= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com; spf=pass smtp.mailfrom=meta.com; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b=KIUObFs6; arc=none smtp.client-ip=67.231.153.30 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=meta.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b="KIUObFs6" Received: from pps.filterd (m0089730.ppops.net [127.0.0.1]) by m0089730.ppops.net (8.18.1.11/8.18.1.11) with ESMTP id 61QHqVfh1068162; Thu, 26 Feb 2026 12:22:47 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=s2048-2025-q2; bh=wgkQLaliju+4WFlO7qLaX85dVVOBwH0k4aL0NBvLBFY=; b=KIUObFs6mawO H8gW175OMdT7S//aDU67kzttjNBMHxP7P4BFzUTYXm26J9j4fVHsoucbNf/DY+pB MuBO+ADrIuEGtnjAyGVw0n91X/1JTp11lMGfEiitC7umRTZIavLOKYUq9oeHT1gE 0rUPJjqRr2x0dYv3d70A/CEEA99t75Voqzgn2ehpW3wGOzM5cOBQjdFuZbFfFaEH hlYDgy2UzBf3QfVluh17AKrGOhiH+4YVWoa8biTSoLa2YQlctCcVcQ+3CfbRc5XO 6owI48m10DRKX/Cyy8D7SVNxLpkuO4SQon0piBZpZ63v4CHZyW27yxuMEnAC0Fl7 Ofjh4gqxGw== Received: from mail.thefacebook.com ([163.114.134.16]) by m0089730.ppops.net (PPS) with ESMTPS id 4cjnjmd8y5-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT); Thu, 26 Feb 2026 12:22:47 -0800 (PST) Received: from localhost (2620:10d:c085:208::f) by mail.thefacebook.com (2620:10d:c08b:78::2ac9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.2562.35; Thu, 26 Feb 2026 20:22:46 +0000 From: Matt Evans To: Alex Williamson , Leon Romanovsky , Jason Gunthorpe , Alex Mastro , Mahmoud Adam , David Matlack CC: =?UTF-8?q?Bj=C3=B6rn=20T=C3=B6pel?= , Sumit Semwal , =?UTF-8?q?Christian=20K=C3=B6nig?= , Kevin Tian , Ankit Agrawal , Pranjal Shrivastava , Alistair Popple , Vivek Kasireddy , , , , , Subject: [RFC PATCH 2/7] vfio/pci: Clean up DMABUFs before disabling function Date: Thu, 26 Feb 2026 12:21:58 -0800 Message-ID: <20260226202211.929005-3-mattev@meta.com> X-Mailer: git-send-email 2.47.3 In-Reply-To: <20260226202211.929005-1-mattev@meta.com> References: <20260226202211.929005-1-mattev@meta.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwMjI2MDE4NiBTYWx0ZWRfX/tneem6RxYSY I63y2vdjAzVgmXqrSbpAn9wulSCZI9FtA/q8vCnvkyYZLcEsT5oTzQEj6M6vln7/2HgM12Ve4kq TjK2sLyRtY89P7N7j0PDW3XDE9U5QMhKh19n7tlc+X5KDm8d9p1bJEdz7Jnu+7Kv1DaggvEpQVz s6KVs3PQcPOqFATCUKBddJFPwd2d0iVLdCWz24BAJK0I54HO5qnF5xdsBkXZkY24Nc9gUBaZKV3 DmLDdRwTr4V4KIabstnOlq+a9IWs6Sp/vArSukdVwe2UftaHXwGxaXgVgeXYUDp2g+S0SnR/XdL ArJ7tVkxsP3JoOTU6TuznHmAnAdI3PIqpDC43TsKUe/c6yM9y7PLOtlJ2i3oLEHtrLQXhlArN8Z 5sfdSLxYLMAxv/kESr/r/CPMvzv1NHahHW4dibZj450rEk2+kzSZkbYdzmisDNWfyg9+zMryTex poMP2xMKKHpHd5VOIDg== X-Proofpoint-ORIG-GUID: ZCwoLGeJgQMU9zaWzcqZlJwB0q3GSBt_ X-Authority-Analysis: v=2.4 cv=B/m0EetM c=1 sm=1 tr=0 ts=69a0ab97 cx=c_pps a=CB4LiSf2rd0gKozIdrpkBw==:117 a=CB4LiSf2rd0gKozIdrpkBw==:17 a=HzLeVaNsDn8A:10 a=VkNPw1HP01LnGYTKEx00:22 a=Mpw57Om8IfrbqaoTuvik:22 a=GgsMoib0sEa3-_RKJdDe:22 a=VabnemYjAAAA:8 a=4DpnPWOeRWuCnonzwFEA:9 a=gKebqoRLp9LExxC7YDUY:22 X-Proofpoint-GUID: ZCwoLGeJgQMU9zaWzcqZlJwB0q3GSBt_ X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.51,FMLib:17.12.100.49 definitions=2026-02-26_02,2026-02-26_01,2025-10-01_01 Content-Type: text/plain; charset="utf-8" On device shutdown, make vfio_pci_core_close_device() call vfio_pci_dma_buf_cleanup() before the function is disabled via vfio_pci_core_disable(). This ensures that any access to DMABUFs is revoked (and importers act on move_notify()) before the function's BARs become inaccessible. This fixes an issue where, if the function is disabled first, a tiny window exists in which the function's MSE is cleared and yet BARs could still be accessed via the DMABUF. Worse, the resources would also be free/up for grabs by a different driver. Fixes: 5d74781ebc86c ("vfio/pci: Add dma-buf export support for MMIO region= s") Signed-off-by: Matt Evans --- drivers/vfio/pci/vfio_pci_core.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_c= ore.c index 3a11e6f450f7..8d0e3605fbc7 100644 --- a/drivers/vfio/pci/vfio_pci_core.c +++ b/drivers/vfio/pci/vfio_pci_core.c @@ -726,10 +726,10 @@ void vfio_pci_core_close_device(struct vfio_device *c= ore_vdev) #if IS_ENABLED(CONFIG_EEH) eeh_dev_release(vdev->pdev); #endif - vfio_pci_core_disable(vdev); - vfio_pci_dma_buf_cleanup(vdev); =20 + vfio_pci_core_disable(vdev); + mutex_lock(&vdev->igate); vfio_pci_eventfd_replace_locked(vdev, &vdev->err_trigger, NULL); vfio_pci_eventfd_replace_locked(vdev, &vdev->req_trigger, NULL); --=20 2.47.3 From nobody Tue Apr 7 17:13:28 2026 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7491A477994; Thu, 26 Feb 2026 20:23:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=67.231.153.30 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772137397; cv=none; b=DKFVlIZmTZnSjAXkcePGBYYkB2KA7531kBCOD2u2n+a4QgpI1+PNqvR9IyS+DHuj8/UjQLxN5oFrVEYY3DKvuZV76Hsdf5zG7dESMysy7nHG/Ze+Vo3QFGf/qSkyF2g2Fi3FqZPY3yUPsIwgloJpt9ihdwZol6hJSv8B/k0IjV4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772137397; c=relaxed/simple; bh=B0tCPKXsVIs/JUt43F+j/QlnMGVFmicRYHIqMI3wmuM=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=jWVXw+AWXcV+UreO95ToQqn5zs01d68ST4zxrDwroNfwjA+dngV6+5qdxTFLwIFgPwKi/OnHk5pnmw2sRUdNC63cgZc2LPRMqExG9lARUOhOulaQz1sFEV1wYvXkl5YQTr32EWjtsl+TD1vuKdhwmeslcv9Nyi/IJIJQTL7DyHE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com; spf=pass smtp.mailfrom=meta.com; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b=JkuK1lla; arc=none smtp.client-ip=67.231.153.30 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=meta.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b="JkuK1lla" Received: from pps.filterd (m0109332.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.18.1.11/8.18.1.11) with ESMTP id 61QH7RIh1891443; Thu, 26 Feb 2026 12:23:02 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=s2048-2025-q2; bh=4pcW/NWjzbNIWAsTEVkV5MDbaa7KgPXPuivmfTCyiik=; b=JkuK1llaFTM/ x+M4io9R69lEge2o1xUbabzu+pFp7vKwQcRi+TIxqA8ekeLfPMYLfAaXEIh4qPeo zK9bOBo7c0jXskyrY0LSCPGWi+V8nJ+2xv6rBZn1O9VxOwfS3MQ6j31NsilqElO9 p9PiZYkRDcVkVSKIYfDIjjDHAT9DZivg8TVwm0k3jk5+iLv6Igr7d0+KUCERAl/Q 9+oQayeCpBxFUqVV8gqjL5rsnKKrMdShVLSdl9ymEN5rOl6sBKW3dufnKybLN+4C Xa+xutxHgsCErgd2CbcZwcdni9zBoDoZk2M4LCPvPFB7J4yy/8Ai13mWqmmE9zrP nVEgbiIzEw== Received: from maileast.thefacebook.com ([163.114.135.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 4cjmp9dnms-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT); Thu, 26 Feb 2026 12:23:02 -0800 (PST) Received: from localhost (2620:10d:c0a8:1b::2d) by mail.thefacebook.com (2620:10d:c0a9:6f::8fd4) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.2562.35; Thu, 26 Feb 2026 20:22:49 +0000 From: Matt Evans To: Alex Williamson , Leon Romanovsky , Jason Gunthorpe , Alex Mastro , Mahmoud Adam , David Matlack CC: =?UTF-8?q?Bj=C3=B6rn=20T=C3=B6pel?= , Sumit Semwal , =?UTF-8?q?Christian=20K=C3=B6nig?= , Kevin Tian , Ankit Agrawal , Pranjal Shrivastava , Alistair Popple , Vivek Kasireddy , , , , , Subject: [RFC PATCH 3/7] vfio/pci: Support mmap() of a DMABUF Date: Thu, 26 Feb 2026 12:21:59 -0800 Message-ID: <20260226202211.929005-4-mattev@meta.com> X-Mailer: git-send-email 2.47.3 In-Reply-To: <20260226202211.929005-1-mattev@meta.com> References: <20260226202211.929005-1-mattev@meta.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Proofpoint-GUID: bo-yooN7D0hXen55CgCeotpAvMNRfdcN X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwMjI2MDE4NiBTYWx0ZWRfXx2R/GJyuvV5T K5TukSw7N8Ug142bKFLuFSR0o4MQURhJ8Vuyk6l1XnmNpwq0blWsl3Jc0WlUjkdvPcV22SFZtAi PMrfRg2cKG87t0g2vExkU2dNBg8BD4hzSs+WZ3/5nAfPL7nfDmbt7KyVWsd74oc++0QPtJQDqyz T1wjiD7aU1N+JPRKtZ0R6evMWaIbolHNceKztQwXP0AL3fx0wIl9QPuA+g6hab8pMcfkvt7CoFp cjueyRZsP/qRbssITWt8W/6ObUypK1vJ36NLIaWXCK1tbHOJggJO2bzm5G7GogTOZ8tnidR9OGe 5hf+PBKISrmAL/nMosZNE91aJzRuQns+Yll/dAeqlRvWUaI0AelGWw429teCDx4IM88fMY3in1q rXZYR3YUTPnOp8yr5LliMWAeTF8MbWLV8KmlRnafj9azYwCNDjc6WSnG6++VWqjm5O9F6w98QJP SoSfAMRwDjcsEOHKzCg== X-Proofpoint-ORIG-GUID: bo-yooN7D0hXen55CgCeotpAvMNRfdcN X-Authority-Analysis: v=2.4 cv=abZsXBot c=1 sm=1 tr=0 ts=69a0aba6 cx=c_pps a=MfjaFnPeirRr97d5FC5oHw==:117 a=MfjaFnPeirRr97d5FC5oHw==:17 a=HzLeVaNsDn8A:10 a=VkNPw1HP01LnGYTKEx00:22 a=Mpw57Om8IfrbqaoTuvik:22 a=GgsMoib0sEa3-_RKJdDe:22 a=VabnemYjAAAA:8 a=mBpQAGpv64ugA0MnHpoA:9 a=gKebqoRLp9LExxC7YDUY:22 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.51,FMLib:17.12.100.49 definitions=2026-02-26_02,2026-02-26_01,2025-10-01_01 Content-Type: text/plain; charset="utf-8" A VFIO DMABUF can export a subset of a BAR to userspace by fd; add support for mmap() of this fd. This provides another route for a process to map BARs, except one where the process can only map a specific subset of a BAR represented by the exported DMABUF. mmap() support enables userspace driver designs that safely delegate access to BAR sub-ranges to other client processes by sharing a DMABUF fd, without having to share the (omnipotent) VFIO device fd with them. The mmap callback installs vm_ops callbacks for .fault and .huge_fault; they find a PFN by searching the DMABUF's physical ranges. That is, DMABUFs with multiple ranges are supported for mmap(). Signed-off-by: Matt Evans --- drivers/vfio/pci/vfio_pci_dmabuf.c | 219 +++++++++++++++++++++++++++++ 1 file changed, 219 insertions(+) diff --git a/drivers/vfio/pci/vfio_pci_dmabuf.c b/drivers/vfio/pci/vfio_pci= _dmabuf.c index 46ab64fbeb19..bebb496bd0f2 100644 --- a/drivers/vfio/pci/vfio_pci_dmabuf.c +++ b/drivers/vfio/pci/vfio_pci_dmabuf.c @@ -85,6 +85,209 @@ static void vfio_pci_dma_buf_release(struct dma_buf *dm= abuf) kfree(priv); } =20 +static int vfio_pci_dma_buf_find_pfn(struct device *dev, + struct vfio_pci_dma_buf *vpdmabuf, + struct vm_area_struct *vma, + unsigned long address, + unsigned int order, + unsigned long *out_pfn) +{ + /* + * Given a VMA (start, end, pgoffs) and a fault address, + * search phys_vec[] to find the range representing the + * address's offset into the VMA (and so a PFN). + * + * The phys_vec ranges represent contiguous spans of VAs + * upwards from the buffer offset 0; the actual PFNs might be + * in any order, overlap/alias, etc. Calculate an offset of + * the desired page given VMA start/pgoff and address, then + * search upwards from 0 to find which span contains it. + * + * On success, a valid PFN for a page sized by 'order' is + * returned into out_pfn. + * + * Failure occurs if: + * - The page would cross the edge of the VMA + * - The page isn't entirely contained within a range + * - We find a range, but the final PFN isn't aligned to the + * requested order. + * + * (Upon failure, the caller is expected to try again with a + * smaller order; the tests above will always succeed for + * order=3D0 as the limit case.) + * + * It's suboptimal if DMABUFs are created with neigbouring + * ranges that are physically contiguous, since hugepages + * can't straddle range boundaries. (The construction of the + * ranges vector should merge such ranges.) + */ + + unsigned long rounded_page_addr =3D address & ~((PAGE_SIZE << order) - 1); + unsigned long rounded_page_end =3D rounded_page_addr + (PAGE_SIZE << orde= r); + unsigned long buf_page_offset; + unsigned long buf_offset =3D 0; + unsigned int i; + + if (rounded_page_addr < vma->vm_start || rounded_page_end > vma->vm_end) + return -EAGAIN; + + if (unlikely(check_add_overflow(rounded_page_addr - vma->vm_start, + vma->vm_pgoff << PAGE_SHIFT, &buf_page_offset))) + return -EFAULT; + + for (i =3D 0; i < vpdmabuf->nr_ranges; i++) { + unsigned long range_len =3D vpdmabuf->phys_vec[i].len; + unsigned long range_start =3D vpdmabuf->phys_vec[i].paddr; + + if (buf_page_offset >=3D buf_offset && + buf_page_offset + (PAGE_SIZE << order) <=3D buf_offset + range_len) { + /* + * The faulting page is wholly contained + * within the span represented by the range. + * Validate PFN alignment for the order: + */ + unsigned long pfn =3D (range_start >> PAGE_SHIFT) + + ((buf_page_offset - buf_offset) >> PAGE_SHIFT); + + if (IS_ALIGNED(pfn, 1 << order)) { + *out_pfn =3D pfn; + return 0; + } + /* Retry with smaller order */ + return -EAGAIN; + } + buf_offset +=3D range_len; + } + + /* + * If we get here, the address fell outside of the span + * represented by the (concatenated) ranges. This can + * never happen because vfio_pci_dma_buf_mmap() checks that + * the VMA is <=3D the total size of the ranges. + * + * But if it does, force SIGBUS for the access, and warn. + */ + WARN_ONCE(1, "No range for addr 0x%lx, order %d: VMA 0x%lx-0x%lx pgoff 0x= %lx, %d ranges, size 0x%lx\n", + address, order, vma->vm_start, vma->vm_end, vma->vm_pgoff, + vpdmabuf->nr_ranges, vpdmabuf->size); + + return -EFAULT; +} + +static vm_fault_t vfio_pci_dma_buf_mmap_huge_fault(struct vm_fault *vmf, + unsigned int order) +{ + struct vm_area_struct *vma =3D vmf->vma; + struct vfio_pci_dma_buf *priv =3D vma->vm_private_data; + struct vfio_pci_core_device *vdev; + unsigned long pfn; + vm_fault_t ret =3D VM_FAULT_FALLBACK; + + vdev =3D READ_ONCE(priv->vdev); + + /* + * A fault for an existing mmap might occur after + * vfio_pci_dma_buf_cleanup() has revoked and destroyed the + * vdev's DMABUFs, and annulled vdev. After creation, vdev is + * only ever written in cleanup. + */ + if (!vdev) + return VM_FAULT_SIGBUS; + + int r =3D vfio_pci_dma_buf_find_pfn(&vdev->pdev->dev, priv, vma, + vmf->address, order, &pfn); + + if (r =3D=3D 0) { + scoped_guard(rwsem_read, &vdev->memory_lock) { + /* Deal with the possibility of a fault racing + * with vfio_pci_dma_buf_move() revoking and + * then unmapping the buffer. The + * revocation/unmap and status change occurs + * whilst holding memory_lock. + */ + if (priv->revoked) + ret =3D VM_FAULT_SIGBUS; + else + ret =3D vfio_pci_vmf_insert_pfn(vdev, vmf, pfn, order); + } + } else if (r !=3D -EAGAIN) { + ret =3D VM_FAULT_SIGBUS; + } + + dev_dbg_ratelimited(&vdev->pdev->dev, + "%s(order =3D %d) PFN 0x%lx, VA 0x%lx, pgoff 0x%lx: 0x%x\n", + __func__, order, pfn, vmf->address, vma->vm_pgoff, (unsigned int)re= t); + + return ret; +} + +static vm_fault_t vfio_pci_dma_buf_mmap_page_fault(struct vm_fault *vmf) +{ + return vfio_pci_dma_buf_mmap_huge_fault(vmf, 0); +} + +static const struct vm_operations_struct vfio_pci_dma_buf_mmap_ops =3D { + .fault =3D vfio_pci_dma_buf_mmap_page_fault, +#ifdef CONFIG_ARCH_SUPPORTS_HUGE_PFNMAP + .huge_fault =3D vfio_pci_dma_buf_mmap_huge_fault, +#endif +}; + +static bool vfio_pci_dma_buf_is_mappable(struct dma_buf *dmabuf) +{ + struct vfio_pci_dma_buf *priv =3D dmabuf->priv; + + /* + * Sanity checks at mmap() time; alignment has already been + * asserted by validate_dmabuf_input(). + * + * Although the revoked state is transient, refuse to map a + * revoked buffer to flag early that something odd is going + * on: for example, users should not be mmap()ing a buffer + * that's being moved [by a user-triggered activity]. + */ + if (priv->revoked) + return false; + + return true; +} + +/* + * Similar to vfio_pci_core_mmap() for a regular VFIO device fd, but + * differs by pre-checks performed and ultimately the vm_ops installed. + */ +static int vfio_pci_dma_buf_mmap(struct dma_buf *dmabuf, struct vm_area_st= ruct *vma) +{ + struct vfio_pci_dma_buf *priv =3D dmabuf->priv; + u64 req_len, req_start; + + if (!vfio_pci_dma_buf_is_mappable(dmabuf)) + return -ENODEV; + if ((vma->vm_flags & VM_SHARED) =3D=3D 0) + return -EINVAL; + + req_len =3D vma->vm_end - vma->vm_start; + req_start =3D vma->vm_pgoff << PAGE_SHIFT; + + if (req_start + req_len > priv->size) + return -EINVAL; + + vma->vm_private_data =3D priv; + vma->vm_page_prot =3D pgprot_noncached(vma->vm_page_prot); + vma->vm_page_prot =3D pgprot_decrypted(vma->vm_page_prot); + + /* + * See comments in vfio_pci_core_mmap() re VM_ALLOW_ANY_UNCACHED. + * + * FIXME: get mapping attributes from dmabuf? + */ + vm_flags_set(vma, VM_ALLOW_ANY_UNCACHED | VM_IO | VM_PFNMAP | + VM_DONTEXPAND | VM_DONTDUMP); + vma->vm_ops =3D &vfio_pci_dma_buf_mmap_ops; + + return 0; +} + static const struct dma_buf_ops vfio_pci_dmabuf_ops =3D { .pin =3D vfio_pci_dma_buf_pin, .unpin =3D vfio_pci_dma_buf_unpin, @@ -92,6 +295,7 @@ static const struct dma_buf_ops vfio_pci_dmabuf_ops =3D { .map_dma_buf =3D vfio_pci_dma_buf_map, .unmap_dma_buf =3D vfio_pci_dma_buf_unmap, .release =3D vfio_pci_dma_buf_release, + .mmap =3D vfio_pci_dma_buf_mmap, }; =20 /* @@ -335,6 +539,11 @@ void vfio_pci_dma_buf_move(struct vfio_pci_core_device= *vdev, bool revoked) struct vfio_pci_dma_buf *tmp; =20 lockdep_assert_held_write(&vdev->memory_lock); + /* + * Holding memory_lock ensures a racing + * vfio_pci_dma_buf_mmap_*_fault() observes priv->revoked + * properly. + */ =20 list_for_each_entry_safe(priv, tmp, &vdev->dmabufs, dmabufs_elm) { if (!get_file_active(&priv->dmabuf->file)) @@ -345,6 +554,14 @@ void vfio_pci_dma_buf_move(struct vfio_pci_core_device= *vdev, bool revoked) priv->revoked =3D revoked; dma_buf_move_notify(priv->dmabuf); dma_resv_unlock(priv->dmabuf->resv); + + /* + * Unmap any possible userspace mappings for a + * now-revoked DMABUF: + */ + if (revoked) + unmap_mapping_range(priv->dmabuf->file->f_mapping, + 0, priv->size, 1); } fput(priv->dmabuf->file); } @@ -366,6 +583,8 @@ void vfio_pci_dma_buf_cleanup(struct vfio_pci_core_devi= ce *vdev) priv->revoked =3D true; dma_buf_move_notify(priv->dmabuf); dma_resv_unlock(priv->dmabuf->resv); + unmap_mapping_range(priv->dmabuf->file->f_mapping, + 0, priv->size, 1); vfio_device_put_registration(&vdev->vdev); fput(priv->dmabuf->file); } --=20 2.47.3 From nobody Tue Apr 7 17:13:28 2026 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8FBF342B73A; Thu, 26 Feb 2026 20:23:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=67.231.153.30 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772137416; cv=none; b=P8576bpHlhzhnboO1a8VWff1WP0vlSbaNJf/1bwxcMf3NkiXQbgtOBtOEOYaJJQRswK2rm2eduVDz2UDCI+RNjQ55RmUy2+QO0nHE25GSyl3uiPCDRmyEyAnsxMAg6PwMReqYN4L6LuL+bAWYAwbGrKljK8cIe8Kz33kNLBamhA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772137416; c=relaxed/simple; bh=ei5XNdXJKikl8/7jifxxAaS/BesM4w0YMmJ/rsrK2/A=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=q8caLHoU903oP+xhePjXSoN3o6CHR8xR1RWEeoS+L+k6T7jjbRANYFij9gzWM5JDT7zvRaxPwH8/C95rYEMM/19QYmwS9AFus85Wdw4y1+UXcCwZpj4eCC72oxqUkqIePVPdBbsWeyJtunUEjeL9E7Posa2Cx60lt5K+s6sZ2Dc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com; spf=pass smtp.mailfrom=meta.com; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b=jAK5Llk0; arc=none smtp.client-ip=67.231.153.30 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=meta.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b="jAK5Llk0" Received: from pps.filterd (m0109332.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.18.1.11/8.18.1.11) with ESMTP id 61QH7RIi1891443; Thu, 26 Feb 2026 12:23:03 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=s2048-2025-q2; bh=Ipxihht3rtz9Tq9wOD86jRBR0utkMeevU4ePkGLhZLs=; b=jAK5Llk0qYpV CZJn6avQTWfS5A1ZnTGjr1ZjBl+rcDXhGdjiv0xfNFgNwIIYg6QFemLoLmnHwsuj 645X5vJx2JntNTMmsF7GnqdaSI9rW99y6n9bF7GtpCEQlsxT+5X888LNcJKzCVwn rO1ZR0wEfCH/HqjJTdtyroRJY2ueQY5VMcADYX2KqwgUUohoPCGQ1kzPpsEfDiJX F/ty1gStBCHTHNwUJCnZIxLHaUIZ/ao0wDTsBQh9CYZIVP8Xk9TEolInOW3JWvbf WgmmMKhQGtMeuaAi28ToVQrs8hLjkP+omDS8z9xRQYgQrIXDyyj7zfrw8qI8nVjV hQEtrIlv7w== Received: from maileast.thefacebook.com ([163.114.135.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 4cjmp9dnms-2 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT); Thu, 26 Feb 2026 12:23:03 -0800 (PST) Received: from localhost (2620:10d:c0a8:1b::30) by mail.thefacebook.com (2620:10d:c0a9:6f::8fd4) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.2562.35; Thu, 26 Feb 2026 20:22:51 +0000 From: Matt Evans To: Alex Williamson , Leon Romanovsky , Jason Gunthorpe , Alex Mastro , Mahmoud Adam , David Matlack CC: =?UTF-8?q?Bj=C3=B6rn=20T=C3=B6pel?= , Sumit Semwal , =?UTF-8?q?Christian=20K=C3=B6nig?= , Kevin Tian , Ankit Agrawal , Pranjal Shrivastava , Alistair Popple , Vivek Kasireddy , , , , , Subject: [RFC PATCH 4/7] dma-buf: uapi: Mechanism to revoke DMABUFs via ioctl() Date: Thu, 26 Feb 2026 12:22:00 -0800 Message-ID: <20260226202211.929005-5-mattev@meta.com> X-Mailer: git-send-email 2.47.3 In-Reply-To: <20260226202211.929005-1-mattev@meta.com> References: <20260226202211.929005-1-mattev@meta.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Proofpoint-GUID: 3U3DcNRGCAe6_3BsET19AsMQ1cy24XLl X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwMjI2MDE4NiBTYWx0ZWRfX8lmNm/Jl/UBZ vDCn1uiQX5OuErzlyFJYljwghvJ3HU6ZIrVIYEXf1k5ehQSurN5dCQe5oe1HtquJulGkb6YdbId rDNl4rvY8+rrwZ3isIymDRmTMf8F3L5LS+GE2LtTWwC2cxuV7k5QEP9M5Xhp60JlH9r4/CzwWO5 1wQgry6CgYt3uxuLm0OuOYSJPrhh4zkPtTRRyuP/SSB2O1/ugQkmMfTSaKsO49tClVnTvqu6G5W S8sP0TDyDpRzUqheqWyWy0P4MGMZN9NrmwtCVHObi5Ynk25rgwohKMKrOwJNWKbAeK44VzeYmcI 5l4B23+k6N/VN6cvHthLrkQq3V0oDpliH6J3dM9v2vNh10b/7dwtUBgkZ8m5bCR1w9P+/CNB3O9 CKCav36FYEKKqdUMb66wJXBT/HEGpxnNPqRUmV0i3YMfdWh96QszON1j4cbgn3jwMRDf0hrcN8c 3+IE/7pp62u5fmi9RdQ== X-Proofpoint-ORIG-GUID: 3U3DcNRGCAe6_3BsET19AsMQ1cy24XLl X-Authority-Analysis: v=2.4 cv=abZsXBot c=1 sm=1 tr=0 ts=69a0aba7 cx=c_pps a=MfjaFnPeirRr97d5FC5oHw==:117 a=MfjaFnPeirRr97d5FC5oHw==:17 a=HzLeVaNsDn8A:10 a=VkNPw1HP01LnGYTKEx00:22 a=Mpw57Om8IfrbqaoTuvik:22 a=GgsMoib0sEa3-_RKJdDe:22 a=VabnemYjAAAA:8 a=xZTfp-UuNEzyo7XdMfQA:9 a=gKebqoRLp9LExxC7YDUY:22 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.51,FMLib:17.12.100.49 definitions=2026-02-26_02,2026-02-26_01,2025-10-01_01 Content-Type: text/plain; charset="utf-8" Add a new dma-buf ioctl() op, DMA_BUF_IOCTL_REVOKE, connected to a new (optional) dma_buf_ops callback, revoke(). An exporter receiving this will _permanently_ revoke the DMABUF, meaning it can no longer be mapped/attached/mmap()ed. It also guarantees that existing importers have been detached (e.g. via move_notify) and all mappings made inaccessible. This is useful for lifecycle management in scenarios where a process has created a DMABUF representing a resource, then delegated it to a client process; access to the resource is revoked when the client is deemed "done", and the resource can be safely re-used elsewhere. Signed-off-by: Matt Evans --- drivers/dma-buf/dma-buf.c | 5 +++++ include/linux/dma-buf.h | 22 ++++++++++++++++++++++ include/uapi/linux/dma-buf.h | 1 + 3 files changed, 28 insertions(+) diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c index edaa9e4ee4ae..b9b315317f2d 100644 --- a/drivers/dma-buf/dma-buf.c +++ b/drivers/dma-buf/dma-buf.c @@ -561,6 +561,11 @@ static long dma_buf_ioctl(struct file *file, case DMA_BUF_IOCTL_IMPORT_SYNC_FILE: return dma_buf_import_sync_file(dmabuf, (const void __user *)arg); #endif + case DMA_BUF_IOCTL_REVOKE: + if (dmabuf->ops->revoke) + return dmabuf->ops->revoke(dmabuf); + else + return -EINVAL; =20 default: return -ENOTTY; diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h index 0bc492090237..a68c9ad7aebd 100644 --- a/include/linux/dma-buf.h +++ b/include/linux/dma-buf.h @@ -277,6 +277,28 @@ struct dma_buf_ops { =20 int (*vmap)(struct dma_buf *dmabuf, struct iosys_map *map); void (*vunmap)(struct dma_buf *dmabuf, struct iosys_map *map); + + /** + * @revoke: + * + * This callback is invoked from a userspace + * DMA_BUF_IOCTL_REVOKE operation, and requests that access to + * the buffer is immediately and permanently revoked. On + * successful return, the buffer is not accessible through any + * mmap() or dma-buf import. The request fails if the buffer + * is pinned; otherwise, the exporter marks the buffer as + * inaccessible and uses the move_notify callback to inform + * importers of the change. The buffer is permanently + * disabled, and the exporter must refuse all map, mmap, + * attach, etc. requests. + * + * Returns: + * + * 0 on success, or a negative error code on failure: + * -ENODEV if the associated device no longer exists/is closed. + * -EBADFD if the buffer has already been revoked. + */ + int (*revoke)(struct dma_buf *dmabuf); }; =20 /** diff --git a/include/uapi/linux/dma-buf.h b/include/uapi/linux/dma-buf.h index 5a6fda66d9ad..84bf2dd2d0f3 100644 --- a/include/uapi/linux/dma-buf.h +++ b/include/uapi/linux/dma-buf.h @@ -178,5 +178,6 @@ struct dma_buf_import_sync_file { #define DMA_BUF_SET_NAME_B _IOW(DMA_BUF_BASE, 1, __u64) #define DMA_BUF_IOCTL_EXPORT_SYNC_FILE _IOWR(DMA_BUF_BASE, 2, struct dma_b= uf_export_sync_file) #define DMA_BUF_IOCTL_IMPORT_SYNC_FILE _IOW(DMA_BUF_BASE, 3, struct dma_bu= f_import_sync_file) +#define DMA_BUF_IOCTL_REVOKE _IO(DMA_BUF_BASE, 4) =20 #endif --=20 2.47.3 From nobody Tue Apr 7 17:13:28 2026 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B9D6643CEF6; Thu, 26 Feb 2026 20:23:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=67.231.153.30 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772137391; cv=none; b=jbGZE09+/g1cqARuRnyfe+9/OCuq7bBmusjOP8Dco4ojdIlHCKZIi0H49wD6sGe/Xx4ObkXfm20WWSZuoM7L3LuThnXI/ysi1eso9eZSCqnG+v99Zq6S3bYnpczzhN3nKKMW+mGZpHA/a60Qmrj9D/jk5BGk+hf5to9kS9OZl/s= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772137391; c=relaxed/simple; bh=+Ibv6ZoijxEBUoR6jKes6I8IXMVOZ4GYbc+1qL1Qp6I=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=u6qfaSe0h6QS1pwSTV4RncG0QJM0H/xBOU7GrhzKdowU6xNQWJKWCZN7TnEP2pe+JHBB+6SFN6eI1ElOx0PmXamtlwpPVxAmyZl6Pf957S4Gj+xOTTQw4LixFrRNGPhyKQyt7wD+dIh8c65rTHihJ50PttAW11XT5ujtjHzZ60A= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com; spf=pass smtp.mailfrom=meta.com; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b=OZuPGg0X; arc=none smtp.client-ip=67.231.153.30 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=meta.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b="OZuPGg0X" Received: from pps.filterd (m0148460.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.18.1.11/8.18.1.11) with ESMTP id 61QG8R8c396710; Thu, 26 Feb 2026 12:22:54 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=s2048-2025-q2; bh=NKJHnGt6sTOYI+IgqMNni7UAaJMAwyCJvuNAQUwxgmw=; b=OZuPGg0XPM2q /Oz2pcVO0V63tGXrKqWNtXhlSoscr5+L01hQPBMYBARvYE9xFWCj3x8ii+JlSvB0 IFaPoQhLdo0VXx1YEkBF7kmIEWqSNi1c4wktdnhxygVD35e9Ecbt6tY5ZDZdHfC2 PktUxG2L5nFgnH7z2zRiBLs+UaV4xGfcFw7IRbpEc4bnjh5IqKpAJEPqfPgtCm5P +epsiKhqNIXv0IJv0M5QBN/bBccy4TVAFOfArsC2NBKnTQO2X7kyxfPsd9r6yCDH MzEMEqTedX5a4GCbyvrugMSRWCidDlenTeuIZ5HC/oYCZtKMkxtRGUsE+M3F0VQ3 U8t/d7Tlyw== Received: from maileast.thefacebook.com ([163.114.135.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 4cjseeay19-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT); Thu, 26 Feb 2026 12:22:53 -0800 (PST) Received: from localhost (2620:10d:c0a8:1c::11) by mail.thefacebook.com (2620:10d:c0a9:6f::237c) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.2562.35; Thu, 26 Feb 2026 20:22:53 +0000 From: Matt Evans To: Alex Williamson , Leon Romanovsky , Jason Gunthorpe , Alex Mastro , Mahmoud Adam , David Matlack CC: =?UTF-8?q?Bj=C3=B6rn=20T=C3=B6pel?= , Sumit Semwal , =?UTF-8?q?Christian=20K=C3=B6nig?= , Kevin Tian , Ankit Agrawal , Pranjal Shrivastava , Alistair Popple , Vivek Kasireddy , , , , , Subject: [RFC PATCH 5/7] vfio/pci: Permanently revoke a DMABUF on request Date: Thu, 26 Feb 2026 12:22:01 -0800 Message-ID: <20260226202211.929005-6-mattev@meta.com> X-Mailer: git-send-email 2.47.3 In-Reply-To: <20260226202211.929005-1-mattev@meta.com> References: <20260226202211.929005-1-mattev@meta.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwMjI2MDE4NiBTYWx0ZWRfX+qHDWMDAKJRM DqWV5o8u8Bw6FcD1AkBzm1ls8II2cYUZ1X4NBM5+UA5hZr5zEp4dL0MWuCRaymlIJYDBV+bRePD stgDjHV+uVfdLCaO0UX3VulOcNcXbJDOKHI8bP1EZXg22rDzhjwWn6g3Vy80XM7ZikM1KnaA7Zz LqFPA6bechejyKgRymib618ZzEOzWr/qG7pPrwRj3qp0zR7ye56JiMvIHtKn7LgzdnIQfCgWdCS KJ940CqbfYk/9lncC0Mg4eubvlbk1Y1a5AfMhu9zkl3LJFzA8S90fPruc8NzaJh/ovIC1sdD1od Ak//Y5awoRANnsHYnkl47wcdEFem1vDRlKzhc1vnNdEwKsMFGCXBYm0H0aYhWVwEoQfy6JTriTh Mlsf0I6+LKKehNMmYAeYSw+tMscEEO24iMmfh00zAIVdfkg8Ao905jB8Kcau8krfAVdCNrNMejl pLr6J637v/bqttvq8Ew== X-Authority-Analysis: v=2.4 cv=df6NHHXe c=1 sm=1 tr=0 ts=69a0ab9d cx=c_pps a=MfjaFnPeirRr97d5FC5oHw==:117 a=MfjaFnPeirRr97d5FC5oHw==:17 a=HzLeVaNsDn8A:10 a=VkNPw1HP01LnGYTKEx00:22 a=Mpw57Om8IfrbqaoTuvik:22 a=GgsMoib0sEa3-_RKJdDe:22 a=VabnemYjAAAA:8 a=gwrgxIBh24ckDezLgM0A:9 a=gKebqoRLp9LExxC7YDUY:22 X-Proofpoint-GUID: EHCRFiXG_--2Hgqi2T8-qZJviE0D8Yxk X-Proofpoint-ORIG-GUID: EHCRFiXG_--2Hgqi2T8-qZJviE0D8Yxk X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.51,FMLib:17.12.100.49 definitions=2026-02-26_02,2026-02-26_01,2025-10-01_01 Content-Type: text/plain; charset="utf-8" Expand the VFIO DMABUF revocation state to three states: Not revoked, temporarily revoked, and permanently revoked. The first two are for existing transient revocation, e.g. across a function reset, and the DMABUF is put into the last in response to an ioctl(DMA_BUF_IOCTL_REVOKE) request. When triggered, dynamic imports are removed, PTEs zapped, and the state changed such that no future mappings/imports are allowed. This is useful to reclaim VFIO PCI BAR ranges previously delegated to a subordinate process: The driver process can ensure that the loans are closed down before repurposing exported ranges. Signed-off-by: Matt Evans --- drivers/vfio/pci/vfio_pci_dmabuf.c | 64 +++++++++++++++++++++++++----- 1 file changed, 53 insertions(+), 11 deletions(-) diff --git a/drivers/vfio/pci/vfio_pci_dmabuf.c b/drivers/vfio/pci/vfio_pci= _dmabuf.c index bebb496bd0f2..af30ca205f31 100644 --- a/drivers/vfio/pci/vfio_pci_dmabuf.c +++ b/drivers/vfio/pci/vfio_pci_dmabuf.c @@ -9,6 +9,17 @@ =20 MODULE_IMPORT_NS("DMA_BUF"); =20 +enum vfio_pci_dma_buf_status { + /* + * A buffer can move freely between OK/accessible and revoked + * states (for example, a device reset will temporarily revoke + * it). It can also be permanently revoked. + */ + VFIO_PCI_DMABUF_OK =3D 0, + VFIO_PCI_DMABUF_TEMP_REVOKED =3D 1, + VFIO_PCI_DMABUF_PERM_REVOKED =3D 2, +}; + struct vfio_pci_dma_buf { struct dma_buf *dmabuf; struct vfio_pci_core_device *vdev; @@ -17,9 +28,11 @@ struct vfio_pci_dma_buf { struct dma_buf_phys_vec *phys_vec; struct p2pdma_provider *provider; u32 nr_ranges; - u8 revoked : 1; + enum vfio_pci_dma_buf_status status; }; =20 +static int vfio_pci_dma_buf_revoke(struct dma_buf *dmabuf); + static int vfio_pci_dma_buf_pin(struct dma_buf_attachment *attachment) { return -EOPNOTSUPP; @@ -38,7 +51,7 @@ static int vfio_pci_dma_buf_attach(struct dma_buf *dmabuf, if (!attachment->peer2peer) return -EOPNOTSUPP; =20 - if (priv->revoked) + if (priv->status !=3D VFIO_PCI_DMABUF_OK) return -ENODEV; =20 return 0; @@ -52,7 +65,7 @@ vfio_pci_dma_buf_map(struct dma_buf_attachment *attachmen= t, =20 dma_resv_assert_held(priv->dmabuf->resv); =20 - if (priv->revoked) + if (priv->status !=3D VFIO_PCI_DMABUF_OK) return ERR_PTR(-ENODEV); =20 return dma_buf_phys_vec_to_sgt(attachment, priv->provider, @@ -205,7 +218,7 @@ static vm_fault_t vfio_pci_dma_buf_mmap_huge_fault(stru= ct vm_fault *vmf, * revocation/unmap and status change occurs * whilst holding memory_lock. */ - if (priv->revoked) + if (priv->status !=3D VFIO_PCI_DMABUF_OK) ret =3D VM_FAULT_SIGBUS; else ret =3D vfio_pci_vmf_insert_pfn(vdev, vmf, pfn, order); @@ -246,7 +259,7 @@ static bool vfio_pci_dma_buf_is_mappable(struct dma_buf= *dmabuf) * on: for example, users should not be mmap()ing a buffer * that's being moved [by a user-triggered activity]. */ - if (priv->revoked) + if (priv->status !=3D VFIO_PCI_DMABUF_OK) return false; =20 return true; @@ -296,6 +309,7 @@ static const struct dma_buf_ops vfio_pci_dmabuf_ops =3D= { .unmap_dma_buf =3D vfio_pci_dma_buf_unmap, .release =3D vfio_pci_dma_buf_release, .mmap =3D vfio_pci_dma_buf_mmap, + .revoke =3D vfio_pci_dma_buf_revoke, }; =20 /* @@ -320,7 +334,7 @@ int vfio_pci_dma_buf_iommufd_map(struct dma_buf_attachm= ent *attachment, return -EOPNOTSUPP; =20 priv =3D attachment->dmabuf->priv; - if (priv->revoked) + if (priv->status !=3D VFIO_PCI_DMABUF_OK) return -ENODEV; =20 /* More than one range to iommufd will require proper DMABUF support */ @@ -506,7 +520,8 @@ int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_= device *vdev, u32 flags, INIT_LIST_HEAD(&priv->dmabufs_elm); down_write(&vdev->memory_lock); dma_resv_lock(priv->dmabuf->resv, NULL); - priv->revoked =3D !__vfio_pci_memory_enabled(vdev); + priv->status =3D __vfio_pci_memory_enabled(vdev) ? VFIO_PCI_DMABUF_OK : + VFIO_PCI_DMABUF_TEMP_REVOKED; list_add_tail(&priv->dmabufs_elm, &vdev->dmabufs); dma_resv_unlock(priv->dmabuf->resv); up_write(&vdev->memory_lock); @@ -541,7 +556,7 @@ void vfio_pci_dma_buf_move(struct vfio_pci_core_device = *vdev, bool revoked) lockdep_assert_held_write(&vdev->memory_lock); /* * Holding memory_lock ensures a racing - * vfio_pci_dma_buf_mmap_*_fault() observes priv->revoked + * vfio_pci_dma_buf_mmap_*_fault() observes priv->status * properly. */ =20 @@ -549,9 +564,11 @@ void vfio_pci_dma_buf_move(struct vfio_pci_core_device= *vdev, bool revoked) if (!get_file_active(&priv->dmabuf->file)) continue; =20 - if (priv->revoked !=3D revoked) { + if ((priv->status =3D=3D VFIO_PCI_DMABUF_OK && revoked) || + (priv->status =3D=3D VFIO_PCI_DMABUF_TEMP_REVOKED && !revoked)) { dma_resv_lock(priv->dmabuf->resv, NULL); - priv->revoked =3D revoked; + priv->status =3D revoked ? VFIO_PCI_DMABUF_TEMP_REVOKED : + VFIO_PCI_DMABUF_OK; dma_buf_move_notify(priv->dmabuf); dma_resv_unlock(priv->dmabuf->resv); =20 @@ -580,7 +597,7 @@ void vfio_pci_dma_buf_cleanup(struct vfio_pci_core_devi= ce *vdev) dma_resv_lock(priv->dmabuf->resv, NULL); list_del_init(&priv->dmabufs_elm); priv->vdev =3D NULL; - priv->revoked =3D true; + priv->status =3D VFIO_PCI_DMABUF_PERM_REVOKED; dma_buf_move_notify(priv->dmabuf); dma_resv_unlock(priv->dmabuf->resv); unmap_mapping_range(priv->dmabuf->file->f_mapping, @@ -590,3 +607,28 @@ void vfio_pci_dma_buf_cleanup(struct vfio_pci_core_dev= ice *vdev) } up_write(&vdev->memory_lock); } + +static int vfio_pci_dma_buf_revoke(struct dma_buf *dmabuf) +{ + struct vfio_pci_dma_buf *priv =3D dmabuf->priv; + struct vfio_pci_core_device *vdev; + + vdev =3D READ_ONCE(priv->vdev); + + if (!vdev) + return -ENODEV; + + scoped_guard(rwsem_read, &vdev->memory_lock) { + if (priv->status =3D=3D VFIO_PCI_DMABUF_PERM_REVOKED) + return -EBADFD; + + dma_resv_lock(priv->dmabuf->resv, NULL); + priv->status =3D VFIO_PCI_DMABUF_PERM_REVOKED; + dma_buf_move_notify(priv->dmabuf); + dma_resv_unlock(priv->dmabuf->resv); + + unmap_mapping_range(priv->dmabuf->file->f_mapping, + 0, priv->size, 1); + } + return 0; +} --=20 2.47.3 From nobody Tue Apr 7 17:13:28 2026 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4A2084657D8; Thu, 26 Feb 2026 20:23:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=67.231.153.30 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772137391; cv=none; b=GnFDitXRkRUbmThWmyiLQjuMuEu6dcoWRyMaAC33e2KSFk6s9Ib1ZBHPMTck5xBu+tQgojVRyN9nUCCmRxyw+9fhcsk0FF/6Dcett0aqMGIBQ92tLxL/8uyBay8wXE+JTM19O5pyzHV0hiwuV4qPPOFAI5c57FROZf/63c0DQ8A= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772137391; c=relaxed/simple; bh=+hmoajerUjgkOA+GtpR1T1dbzFW+mfYacHuqhuEefzI=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=sNLIJQbQUJvQyJgiO9GayeJSMQnefsebJQFOcrH92IFQ1HVmHn3FW12LRErppHUrtQYeX6KkHLNHqo2kRQe0NwXP9eQaCaT2QfXjge07ipgzL2899F4s+e4iw5HMauwIIiM0Uq/Wcme/nr03FSD6+VOAbEH1zfOOqFzurKpqWgE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com; spf=pass smtp.mailfrom=meta.com; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b=t17APBLU; arc=none smtp.client-ip=67.231.153.30 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=meta.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b="t17APBLU" Received: from pps.filterd (m0109332.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.18.1.11/8.18.1.11) with ESMTP id 61QHCNlC1891453; Thu, 26 Feb 2026 12:22:57 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=s2048-2025-q2; bh=p4qhrvMfn67SpV2MVw3xZIKd+Em4yb7TrPNhhwEdCuo=; b=t17APBLUkTAk TDuePMsscDzk42nuhHQNYT0Ki9+g8Oujt9ATD+FfxXKRDT60ed/9WX3gaUBplyy2 fSjJujdpp4NS1Lx8xS2FqUn+HB5s9kXIi7sGmhq+lqyXvA+bTvLxf/tpoKK9J6tT 7I/Tt7Yqh9nq/XeWw+0A9rLu03x5hNDqL2r6dHEZVaAL83A4ZKS4z3enT0B1UzOZ MbHJ4iymy1uc6P+q2CZVwKVjzqtDg9lkgekr22GzajmPdRvb6b3knhJ+ieSPp2IK EEWopxC53+arEgXjQw1rq0Fp97can/60jIqgOjoyTHAEug+LzUEVZaLiegznYTcl L846ch1Vbw== Received: from mail.thefacebook.com ([163.114.134.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 4cjmp9dnm2-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT); Thu, 26 Feb 2026 12:22:57 -0800 (PST) Received: from localhost (2620:10d:c085:208::f) by mail.thefacebook.com (2620:10d:c08b:78::2ac9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.2562.35; Thu, 26 Feb 2026 20:22:56 +0000 From: Matt Evans To: Alex Williamson , Leon Romanovsky , Jason Gunthorpe , Alex Mastro , Mahmoud Adam , David Matlack CC: =?UTF-8?q?Bj=C3=B6rn=20T=C3=B6pel?= , Sumit Semwal , =?UTF-8?q?Christian=20K=C3=B6nig?= , Kevin Tian , Ankit Agrawal , Pranjal Shrivastava , Alistair Popple , Vivek Kasireddy , , , , , Subject: [RFC PATCH 6/7] vfio/pci: Add mmap() attributes to DMABUF feature Date: Thu, 26 Feb 2026 12:22:02 -0800 Message-ID: <20260226202211.929005-7-mattev@meta.com> X-Mailer: git-send-email 2.47.3 In-Reply-To: <20260226202211.929005-1-mattev@meta.com> References: <20260226202211.929005-1-mattev@meta.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Proofpoint-GUID: tLZAnEt99yhMVR7DzzedwG1Q3dqkNAqK X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwMjI2MDE4NiBTYWx0ZWRfX15eKNzaG/OmP JFo3cdA2DKdUsgi8aH90lXhs5OMcRD7XzA97n5hdziEdEWmdyrYdnxdK0PJZdQVyzJ/XLBHIMwj qJL6SlcuMCuHgiQ0/TuqPlBY4fFt2hp183lpHN591XieNyxFQfOrTQZ8+6ZoZ/rOddth5U7N/TD FjTA+V8Rp7vXPfToGi40TmaPianF5D9K9vBtl5zVS1KnchMnIX1ghgZwNogwYfFovthbdct58yx BWqCfP0qLO2p+iGbx/xGiHLh3AE/dK5SUka6uin+x+GffqNXNtcWXL+K8U/vrBfwunu3Busiu3Z an0F0I+9k5EPQRFdpk2IubbFH1IP68nTkHva23hF49VRObMu0ZBG2ZvY5bQUFQWxHsdM8MOjE/b zLUgmyACMGwhWlnkCEcgti7VcRXe3sGQGi8Zdnm4esYA0feklcvHPLeDWYfatBi8a4/Nerd0HHb xZeiBGylNYCFg65TbWA== X-Proofpoint-ORIG-GUID: tLZAnEt99yhMVR7DzzedwG1Q3dqkNAqK X-Authority-Analysis: v=2.4 cv=abZsXBot c=1 sm=1 tr=0 ts=69a0aba1 cx=c_pps a=CB4LiSf2rd0gKozIdrpkBw==:117 a=CB4LiSf2rd0gKozIdrpkBw==:17 a=HzLeVaNsDn8A:10 a=VkNPw1HP01LnGYTKEx00:22 a=Mpw57Om8IfrbqaoTuvik:22 a=GgsMoib0sEa3-_RKJdDe:22 a=VabnemYjAAAA:8 a=dDaxl782XL0KO6pWJDcA:9 a=gKebqoRLp9LExxC7YDUY:22 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.51,FMLib:17.12.100.49 definitions=2026-02-26_02,2026-02-26_01,2025-10-01_01 Content-Type: text/plain; charset="utf-8" A new field is reserved in vfio_device_feature_dma_buf.flags to request CPU-facing memory type attributes for mmap()s of the buffer. Add a flag VFIO_DEVICE_FEATURE_DMA_BUF_ATTR_WC, which results in WC PTEs for the DMABUF's BAR region. Signed-off-by: Matt Evans --- drivers/vfio/pci/vfio_pci_dmabuf.c | 18 ++++++++++++++---- include/uapi/linux/vfio.h | 12 +++++++++--- 2 files changed, 23 insertions(+), 7 deletions(-) diff --git a/drivers/vfio/pci/vfio_pci_dmabuf.c b/drivers/vfio/pci/vfio_pci= _dmabuf.c index af30ca205f31..d66a918e9934 100644 --- a/drivers/vfio/pci/vfio_pci_dmabuf.c +++ b/drivers/vfio/pci/vfio_pci_dmabuf.c @@ -28,6 +28,7 @@ struct vfio_pci_dma_buf { struct dma_buf_phys_vec *phys_vec; struct p2pdma_provider *provider; u32 nr_ranges; + u32 attrs; enum vfio_pci_dma_buf_status status; }; =20 @@ -286,13 +287,15 @@ static int vfio_pci_dma_buf_mmap(struct dma_buf *dmab= uf, struct vm_area_struct * return -EINVAL; =20 vma->vm_private_data =3D priv; - vma->vm_page_prot =3D pgprot_noncached(vma->vm_page_prot); + + if (priv->attrs =3D=3D VFIO_DEVICE_FEATURE_DMA_BUF_ATTR_WC) + vma->vm_page_prot =3D pgprot_writecombine(vma->vm_page_prot); + else + vma->vm_page_prot =3D pgprot_noncached(vma->vm_page_prot); vma->vm_page_prot =3D pgprot_decrypted(vma->vm_page_prot); =20 /* * See comments in vfio_pci_core_mmap() re VM_ALLOW_ANY_UNCACHED. - * - * FIXME: get mapping attributes from dmabuf? */ vm_flags_set(vma, VM_ALLOW_ANY_UNCACHED | VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP); @@ -402,6 +405,12 @@ static int validate_dmabuf_input(struct vfio_device_fe= ature_dma_buf *dma_buf, size_t length =3D 0; u32 i; =20 + if ((dma_buf->flags !=3D 0) && + ((dma_buf->flags & ~VFIO_DEVICE_FEATURE_DMA_BUF_ATTR_MASK) || + ((dma_buf->flags & VFIO_DEVICE_FEATURE_DMA_BUF_ATTR_MASK) !=3D + VFIO_DEVICE_FEATURE_DMA_BUF_ATTR_WC))) + return -EINVAL; + for (i =3D 0; i < dma_buf->nr_ranges; i++) { u64 offset =3D dma_ranges[i].offset; u64 len =3D dma_ranges[i].length; @@ -446,7 +455,7 @@ int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_= device *vdev, u32 flags, if (copy_from_user(&get_dma_buf, arg, sizeof(get_dma_buf))) return -EFAULT; =20 - if (!get_dma_buf.nr_ranges || get_dma_buf.flags) + if (!get_dma_buf.nr_ranges) return -EINVAL; =20 /* @@ -490,6 +499,7 @@ int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_= device *vdev, u32 flags, priv->vdev =3D vdev; priv->nr_ranges =3D get_dma_buf.nr_ranges; priv->size =3D length; + priv->attrs =3D get_dma_buf.flags & VFIO_DEVICE_FEATURE_DMA_BUF_ATTR_MASK; ret =3D vdev->pci_ops->get_dmabuf_phys(vdev, &priv->provider, get_dma_buf.region_index, priv->phys_vec, dma_ranges, diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h index ac2329f24141..9e0fbf333452 100644 --- a/include/uapi/linux/vfio.h +++ b/include/uapi/linux/vfio.h @@ -1487,7 +1487,9 @@ struct vfio_device_feature_bus_master { * etc. offset/length specify a slice of the region to create the dmabuf f= rom. * nr_ranges is the total number of (P2P DMA) ranges that comprise the dma= buf. * - * flags should be 0. + * flags contains: + * - A field for userspace mapping attribute: by default, suitable for reg= ular + * MMIO. Alternate attributes (such as WC) can be selected. * * Return: The fd number on success, -1 and errno is set on failure. */ @@ -1501,8 +1503,12 @@ struct vfio_region_dma_range { struct vfio_device_feature_dma_buf { __u32 region_index; __u32 open_flags; - __u32 flags; - __u32 nr_ranges; + __u32 flags; + /* Flags sub-field reserved for attribute enum */ +#define VFIO_DEVICE_FEATURE_DMA_BUF_ATTR_MASK (0xf << 28) +#define VFIO_DEVICE_FEATURE_DMA_BUF_ATTR_UC (0 << 28) +#define VFIO_DEVICE_FEATURE_DMA_BUF_ATTR_WC (1 << 28) + __u32 nr_ranges; struct vfio_region_dma_range dma_ranges[] __counted_by(nr_ranges); }; =20 --=20 2.47.3 From nobody Tue Apr 7 17:13:28 2026 Received: from mx0a-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C97B5477984; Thu, 26 Feb 2026 20:23:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=67.231.153.30 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772137396; cv=none; b=XUXFlDL6d1wAfQfjfHiM0wvrLQnjn0ZMHpLy3EuQ0emS3KJ7tCdRczwisgf2vTHDOijYaGOEduUxhHkilc1wihz5DqlAsms9CjoOe0/H2QfvfuMkisd4MojmEoRdTDVxdGH+Ow2ENiYya4cmVVufy6+erNhrSxwGjqLTZTDBnW4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772137396; c=relaxed/simple; bh=/xR4u/3qDm/6fWrzI1KUDlZLroW/Z6sT3P5+JRxToKY=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=r9Buze4OzBSCa81ZDkMT6pFIudOwhpSXsIX5jUMVPDVq2bcTBi6686y5W3u/rIKkCGAEoqWkJY47qwhv1W6GoVTIdKiRdOkqfroOenz67aVCBXiQAzml0Nvi1E4/XrAr9gPp3oBdaM3LbhwuGAbkt36zFYzZenZ2EGecNyAfZIY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com; spf=pass smtp.mailfrom=meta.com; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b=IH5UgA8r; arc=none smtp.client-ip=67.231.153.30 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=meta.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b="IH5UgA8r" Received: from pps.filterd (m0001303.ppops.net [127.0.0.1]) by m0001303.ppops.net (8.18.1.11/8.18.1.11) with ESMTP id 61QGTBp63090278; Thu, 26 Feb 2026 12:23:00 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=s2048-2025-q2; bh=30cmGsfadDddfi+TV1dmEOO+sq6Mvzs/Yd6JYj3bT/4=; b=IH5UgA8rqx3J yG7UieTmc7KAzhfYG4KCzIrN5olHnXELliNLCEeVF6THHPFwsuLDZWaVJSnwWVMO e7fADPyvLa2dx9YLFQHPt69B+8oGFf5SfsRWJRTw8yNl+Bri3dkeu4HlQAL8VXuT dxQGWZY9MwOa4E51mzCZ9a2STkX5RtLCFpOdPwCLIK8wNW744+F9zcVPqAyk/OKE t6ReuBpJtUy05XLpe96Sm9+PljCabuWbUT0kCEtWIGvbBOsanTevbxXsi1FEdN9d 8uCDnkKKqHuZ+5vY0IwXcw+SDvzzC81Wh4gSN87CEHNTQ4TzxabeLgbCQZxlsJkL O+yDRoeVKQ== Received: from maileast.thefacebook.com ([163.114.135.16]) by m0001303.ppops.net (PPS) with ESMTPS id 4cjsr4jsd4-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT); Thu, 26 Feb 2026 12:22:59 -0800 (PST) Received: from localhost (2620:10d:c0a8:1b::30) by mail.thefacebook.com (2620:10d:c0a9:6f::237c) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.2562.35; Thu, 26 Feb 2026 20:22:58 +0000 From: Matt Evans To: Alex Williamson , Leon Romanovsky , Jason Gunthorpe , Alex Mastro , Mahmoud Adam , David Matlack CC: =?UTF-8?q?Bj=C3=B6rn=20T=C3=B6pel?= , Sumit Semwal , =?UTF-8?q?Christian=20K=C3=B6nig?= , Kevin Tian , Ankit Agrawal , Pranjal Shrivastava , Alistair Popple , Vivek Kasireddy , , , , , Subject: [RFC PATCH 7/7] [RFC ONLY] selftests: vfio: Add standalone vfio_dmabuf_mmap_test Date: Thu, 26 Feb 2026 12:22:03 -0800 Message-ID: <20260226202211.929005-8-mattev@meta.com> X-Mailer: git-send-email 2.47.3 In-Reply-To: <20260226202211.929005-1-mattev@meta.com> References: <20260226202211.929005-1-mattev@meta.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwMjI2MDE4NiBTYWx0ZWRfX95vCTs1KcWNK jLzTP04M3+H+relHno8sjNGvCxamPInlF/yccBiSgSHfDQJIzaZoWUs+Y7zUh948Wsv5fXhRqpx ZyJ+PEfd2eU3q2clkBDMSbsqEZMkUUD/PyjRuKXDFGynXw++gpgflYuRmaIT+eXFBnkGLSfqNMV stGv5HwHt0vwdcQmx5JHsgau0wSE2Uj7CL5L30Rg5CtnLtTNahocFJ6dRTiLxhSjlCFaS2DRmmV qYKyOs2M/dHQ/9CeOlJ7+mxKbxsJ/y5K6WM1FklQKmZq2291v/U0yT6gBExrmywB1i+3tJQrdan 2GpRN1A3nZ/zep9UDTY8x/Hv5RYjcBZopkxhmofJHFiESsHTVwCWNlf7kw/NXuNMX6x9CORu1ep NJ8U5YQazexMEmIkzTioiEBQq+cf/Sp+oMSZwMBwsnkIAIU0uB2DbamPmePTlBpS7NjUPl3Z8Qz q4yzJIAhtSY2U1gLByQ== X-Authority-Analysis: v=2.4 cv=daeNHHXe c=1 sm=1 tr=0 ts=69a0aba4 cx=c_pps a=MfjaFnPeirRr97d5FC5oHw==:117 a=MfjaFnPeirRr97d5FC5oHw==:17 a=HzLeVaNsDn8A:10 a=VkNPw1HP01LnGYTKEx00:22 a=Mpw57Om8IfrbqaoTuvik:22 a=GgsMoib0sEa3-_RKJdDe:22 a=VabnemYjAAAA:8 a=jgVWRkJC2vtQ8RevbvwA:9 a=gKebqoRLp9LExxC7YDUY:22 X-Proofpoint-GUID: CPOa9ihGe5gJqTztl1YxoiUyNX9Wxpeb X-Proofpoint-ORIG-GUID: CPOa9ihGe5gJqTztl1YxoiUyNX9Wxpeb X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.51,FMLib:17.12.100.49 definitions=2026-02-26_02,2026-02-26_01,2025-10-01_01 Content-Type: text/plain; charset="utf-8" This test exercises VFIO DMABUF mmap() to userspace, including various revocation/shutdown cases (which make the VMA inacessible). This is a TEMPORARY test, just to illustrate a new UAPI and DMABUF/mmap() usage. Since it originates from out-of-tree code, it duplicates some of the VFIO device setup code in .../selftests/vfio/lib. Instead, the tests should be folded into the existing VFIO tests. Signed-off-by: Matt Evans --- tools/testing/selftests/vfio/Makefile | 1 + .../vfio/standalone/vfio_dmabuf_mmap_test.c | 822 ++++++++++++++++++ 2 files changed, 823 insertions(+) create mode 100644 tools/testing/selftests/vfio/standalone/vfio_dmabuf_mma= p_test.c diff --git a/tools/testing/selftests/vfio/Makefile b/tools/testing/selftest= s/vfio/Makefile index 3c796ca99a50..3382a2617f2d 100644 --- a/tools/testing/selftests/vfio/Makefile +++ b/tools/testing/selftests/vfio/Makefile @@ -4,6 +4,7 @@ TEST_GEN_PROGS +=3D vfio_iommufd_setup_test TEST_GEN_PROGS +=3D vfio_pci_device_test TEST_GEN_PROGS +=3D vfio_pci_device_init_perf_test TEST_GEN_PROGS +=3D vfio_pci_driver_test +TEST_GEN_PROGS +=3D standalone/vfio_dmabuf_mmap_test =20 TEST_FILES +=3D scripts/cleanup.sh TEST_FILES +=3D scripts/lib.sh diff --git a/tools/testing/selftests/vfio/standalone/vfio_dmabuf_mmap_test.= c b/tools/testing/selftests/vfio/standalone/vfio_dmabuf_mmap_test.c new file mode 100644 index 000000000000..450d6e883bb0 --- /dev/null +++ b/tools/testing/selftests/vfio/standalone/vfio_dmabuf_mmap_test.c @@ -0,0 +1,822 @@ +/* + * Tests for VFIO DMABUF userspace mmap() + * + * As well as the basics (mmap() a BAR resource to userspace), test + * shutdown/unmapping, aliasing, and DMABUF revocation scenarios. + * + * This test relies on being attached to a QEMU EDU device (for a + * simple known MMIO layout). Example invocation, assuming function + * 0000:00:03.0 is the target: + * + * # lspci -n -s 00:03.0 + * 00:03.0 00ff: 1234:11e8 (rev 10) + * + * # readlink /sys/bus/pci/devices/0000\:00\:03.0/iommu_group + * ../../../../../kernel/iommu_groups/3 + * + * (if there's a driver already attached) + * # echo 0000:00:03.0 > /sys/bus/pci/devices/0000:00:03.0/driver/unbind + * + * (and, might need) + * # echo 1 > /sys/module/vfio_iommu_type1/parameters/allow_unsafe_interr= upts + * + * Attach to VFIO: + * # echo 1234 11e8 > /sys/bus/pci/drivers/vfio-pci/new_id + * + * There should be only one thing in the group: + * # ls /sys/bus/pci/devices/0000:00:03.0/iommu_group/devices + * + * Then given above an invocation would be: + * # this_test -r 0000:00:03.0 -g 3 + * + * However, note the QEMU EDU device has a very small address span of + * useful things in BAR0, which makes testing a non-zero BAR offset + * impossible. An "extended EDU" device is supported, which just + * presents a large chunk of memory as a second BAR resource: this + * allows non-zero BAR offsets to be tested. See below for a QEMU + * diff... + * + * Copyright (c) Meta Platforms, Inc. and affiliates. + * + * This software may be used and distributed according to the terms of the + * GNU General Public License version 2. + */ + +/* +diff --git a/hw/misc/edu.c b/hw/misc/edu.c +index cece633e11..5f119e0642 100644 +--- a/hw/misc/edu.c ++++ b/hw/misc/edu.c +@@ -47,6 +47,7 @@ DECLARE_INSTANCE_CHECKER(EduState, EDU, + struct EduState { + PCIDevice pdev; + MemoryRegion mmio; ++ MemoryRegion ram; +=20 + QemuThread thread; + QemuMutex thr_mutex; +@@ -386,7 +387,12 @@ static void pci_edu_realize(PCIDevice *pdev, Error **= errp) +=20 + memory_region_init_io(&edu->mmio, OBJECT(edu), &edu_mmio_ops, edu, + "edu-mmio", 1 * MiB); ++ memory_region_init_ram(&edu->ram, OBJECT(edu), "edu-ram", 64 * MiB, &= error_fatal); + pci_register_bar(pdev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY, &edu->mmio); ++ pci_register_bar(pdev, 1, ++ PCI_BASE_ADDRESS_SPACE_MEMORY | ++ PCI_BASE_ADDRESS_MEM_PREFETCH | ++ PCI_BASE_ADDRESS_MEM_TYPE_64, &edu->ram); + } +=20 + static void pci_edu_uninit(PCIDevice *pdev) +*/ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define ROUND_UP(x, to) (((x) + (to) - 1) & ~((to) - 1)) +#define MiB(x) ((x) * 1024ULL * 1024) + +#define EDU_REG_MAGIC 0x00 +#define EDU_MAGIC_VAL 0x010000edu +#define EDU_REG_INVERT 0x04 + +#define FAIL_IF(cond, msg...) \ + do { \ + if (cond) { \ + printf("\n\nFAIL:\t"); \ + printf(msg); \ + exit(1); \ + } \ + } while (0) + +static int vfio_setup(int groupnr, char *rid_str, + struct vfio_region_info *out_mappable_regions, + int nr_regions, int *out_nr_regions, int *out_vfio_cfd, + int *out_vfio_devfd) +{ + /* Create a new container, add group to it, open device, read + * resource, reset, etc. Based on the example code in + * Documentation/driver-api/vfio.rst + */ + + int container =3D open("/dev/vfio/vfio", O_RDWR); + + int r =3D ioctl(container, VFIO_GET_API_VERSION); + + if (r !=3D VFIO_API_VERSION) { + /* Unknown API version */ + printf("-E- Unknown API ver %d\n", r); + return 1; + } + + if (ioctl(container, VFIO_CHECK_EXTENSION, VFIO_TYPE1_IOMMU) !=3D 1) { + printf("-E- Doesn't support type 1\n"); + return 1; + } + + char devpath[PATH_MAX]; + + snprintf(devpath, PATH_MAX - 1, "/dev/vfio/%d", groupnr); + /* Open the group */ + int group =3D open(devpath, O_RDWR); + + if (group < 0) { + printf("-E- Can't open VFIO device (group %d)\n", groupnr); + return 1; + } + + /* Test the group is viable and available */ + struct vfio_group_status group_status =3D { .argsz =3D sizeof( + group_status) }; + + if (ioctl(group, VFIO_GROUP_GET_STATUS, &group_status)) { + perror("-E- Can't get group status"); + return 1; + } + + if (!(group_status.flags & VFIO_GROUP_FLAGS_VIABLE)) { + /* Group is not viable (ie, not all devices bound for vfio) */ + printf("-E- Group %d is not viable!\n", groupnr); + return 1; + } + + /* Add the group to the container */ + if (ioctl(group, VFIO_GROUP_SET_CONTAINER, &container)) { + perror("-E- Can't add group to container"); + return 1; + } + + /* Enable the IOMMU model we want */ + if (ioctl(container, VFIO_SET_IOMMU, VFIO_TYPE1_IOMMU)) { + perror("-E- Can't select T1"); + return 1; + } + + /* Get addition IOMMU info */ + struct vfio_iommu_type1_info iommu_info =3D { .argsz =3D sizeof( + iommu_info) }; + + if (ioctl(container, VFIO_IOMMU_GET_INFO, &iommu_info)) { + perror("-E- Can't get VFIO info"); + return 1; + } + + /* Get a file descriptor for the device */ + int device =3D ioctl(group, VFIO_GROUP_GET_DEVICE_FD, rid_str); + + if (device < 0) { + perror("-E- Can't get device fd"); + return 1; + } + close(group); + + /* Test and setup the device */ + struct vfio_device_info device_info =3D { .argsz =3D sizeof(device_info) = }; + + if (ioctl(device, VFIO_DEVICE_GET_INFO, &device_info)) { + perror("-E- Can't get device info"); + return 1; + } + printf("-i- %d device regions, flags 0x%x\n", device_info.num_regions, + device_info.flags); + + /* Regions are BAR0-5 then ROM, config, VGA */ + int out_region =3D 0; + + for (int i =3D 0; i < device_info.num_regions; i++) { + struct vfio_region_info reg =3D { .argsz =3D sizeof(reg) }; + + reg.index =3D i; + + if (ioctl(device, VFIO_DEVICE_GET_REGION_INFO, ®)) { + /* We expect EINVAL if there's no VGA region */ + printf("-W- Region %d: ERROR %d\n", i, errno); + } else { + printf("-i- Region %d: flags 0x%08x (%c%c%c), cap_offs %d, size 0x%llx,= offs 0x%llx\n", + i, reg.flags, + (reg.flags & VFIO_REGION_INFO_FLAG_READ) ? 'R' : + '-', + (reg.flags & VFIO_REGION_INFO_FLAG_WRITE) ? 'W' : + '-', + (reg.flags & VFIO_REGION_INFO_FLAG_MMAP) ? 'M' : + '-', + reg.cap_offset, reg.size, reg.offset); + + if ((reg.flags & VFIO_REGION_INFO_FLAG_MMAP) && + (out_region < nr_regions)) + out_mappable_regions[out_region++] =3D reg; + } + } + *out_nr_regions =3D out_region; + +#ifdef THERE_ARE_NO_IRQS_YET + for (i =3D 0; i < device_info.num_irqs; i++) { + struct vfio_irq_info irq =3D { .argsz =3D sizeof(irq) }; + + irq.index =3D i; + + ioctl(device, VFIO_DEVICE_GET_IRQ_INFO, &irq); + + /* Setup IRQs... eventfds, VFIO_DEVICE_SET_IRQS */ + } +#endif + /* Gratuitous device reset and go... */ + if (ioctl(device, VFIO_DEVICE_RESET)) + perror("-W- Can't reset device (continuing)"); + + *out_vfio_cfd =3D container; + *out_vfio_devfd =3D device; + + return 0; +} + +static int vfio_feature_present(int dev_fd, uint32_t feature) +{ + struct vfio_device_feature probeftr =3D { + .argsz =3D sizeof(probeftr), + .flags =3D VFIO_DEVICE_FEATURE_PROBE | VFIO_DEVICE_FEATURE_GET | + feature, + }; + return ioctl(dev_fd, VFIO_DEVICE_FEATURE, &probeftr) =3D=3D 0; +} + +static int vfio_create_dmabuf(int dev_fd, uint32_t region, uint64_t offset, + uint64_t length) +{ + uint64_t ftrbuf + [ROUND_UP(sizeof(struct vfio_device_feature) + + sizeof(struct vfio_device_feature_dma_buf) + + sizeof(struct vfio_region_dma_range), + 8) / + 8]; + + struct vfio_device_feature *f =3D (struct vfio_device_feature *)ftrbuf; + struct vfio_device_feature_dma_buf *db =3D + (struct vfio_device_feature_dma_buf *)f->data; + struct vfio_region_dma_range *range =3D + (struct vfio_region_dma_range *)db->dma_ranges; + + f->argsz =3D sizeof(ftrbuf); + f->flags =3D VFIO_DEVICE_FEATURE_GET | VFIO_DEVICE_FEATURE_DMA_BUF; + db->region_index =3D region; + db->open_flags =3D O_RDWR | O_CLOEXEC; + db->flags =3D 0; + db->nr_ranges =3D 1; + range->offset =3D offset; + range->length =3D length; + + return ioctl(dev_fd, VFIO_DEVICE_FEATURE, &ftrbuf); +} + +/* As above, but try multiple ranges in one dmabuf */ +static int vfio_create_dmabuf_dual(int dev_fd, uint32_t region, + uint64_t offset0, uint64_t length0, + uint64_t offset1, uint64_t length1) +{ + uint64_t ftrbuf + [ROUND_UP(sizeof(struct vfio_device_feature) + + sizeof(struct vfio_device_feature_dma_buf) + + (sizeof(struct vfio_region_dma_range) * 2), + 8) / + 8]; + + struct vfio_device_feature *f =3D (struct vfio_device_feature *)ftrbuf; + struct vfio_device_feature_dma_buf *db =3D + (struct vfio_device_feature_dma_buf *)f->data; + struct vfio_region_dma_range *range =3D + (struct vfio_region_dma_range *)db->dma_ranges; + + f->argsz =3D sizeof(ftrbuf); + f->flags =3D VFIO_DEVICE_FEATURE_GET | VFIO_DEVICE_FEATURE_DMA_BUF; + db->region_index =3D region; + db->open_flags =3D O_RDWR | O_CLOEXEC; + db->flags =3D 0; + db->nr_ranges =3D 2; + range[0].offset =3D offset0; + range[0].length =3D length0; + range[1].offset =3D offset1; + range[1].length =3D length1; + + return ioctl(dev_fd, VFIO_DEVICE_FEATURE, &ftrbuf); +} + +static volatile uint32_t *mmap_resource_aligned(size_t size, + unsigned long align, int fd, + unsigned long offset) +{ + void *v; + + if (align <=3D getpagesize()) { + v =3D mmap(0, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, + offset); + FAIL_IF(v =3D=3D MAP_FAILED, + "Can't mmap fd %d (size 0x%lx, offset 0x%lx), %d\n", fd, + size, offset, errno); + } else { + size_t resv_size =3D size + align; + void *resv =3D + mmap(0, resv_size, 0, MAP_PRIVATE | MAP_ANON, -1, 0); + FAIL_IF(resv =3D=3D MAP_FAILED, + "Can't mmap reservation, size 0x%lx, %d\n", resv_size, + errno); + + uintptr_t pos =3D ((uintptr_t)resv + (align - 1)) & ~(align - 1); + + v =3D mmap((void *)pos, size, PROT_READ | PROT_WRITE, + MAP_SHARED | MAP_FIXED, fd, offset); + FAIL_IF(v =3D=3D MAP_FAILED, + "Can't mmap-fixed fd %d (size 0x%lx, offset 0x%lx), %d\n", + fd, size, offset, errno); + madvise((void *)v, size, MADV_HUGEPAGE); + + /* Tidy */ + if (pos > (uintptr_t)resv) + munmap(resv, pos - (uintptr_t)resv); + if (pos + size < (uintptr_t)resv + resv_size) + munmap((void *)pos + size, + (uintptr_t)resv + resv_size - (pos + size)); + } + + return (volatile uint32_t *)v; +} + +static volatile uint32_t *mmap_resource(size_t size, int fd, + unsigned long offset) +{ + return mmap_resource_aligned(size, getpagesize(), fd, offset); +} + +static void check_mmio(volatile uint32_t *base) +{ + static uint32_t magic =3D 0xdeadbeef; + uint32_t v; + + printf("-i- MMIO check: "); + + /* Trivial MMIO */ + v =3D base[EDU_REG_MAGIC / 4]; + FAIL_IF(v !=3D EDU_MAGIC_VAL, + "Magic value %08x incorrect, BAR map bad?\n", v); + + base[EDU_REG_INVERT / 4] =3D magic; + v =3D base[EDU_REG_INVERT / 4]; + FAIL_IF(v !=3D ~magic, "Inverterizer value %08x bad (should be %08x)\n", + v, ~magic); + printf("OK\n"); + + magic =3D (magic << 1) ^ (magic >> 1) ^ (magic << 7); +} + +static jmp_buf jmpbuf; + +static void sighandler(int sig) +{ + printf("*** Signal %d ***\n", sig); + siglongjmp(jmpbuf, sig); +} + +static void setup_signals(void) +{ + struct sigaction sa =3D { + .sa_handler =3D sighandler, + .sa_flags =3D 0, + }; + + sigaction(SIGBUS, &sa, NULL); +} + +static int vfio_dmabuf_test(int groupnr, char *rid_str) +{ + /* Only expecting one or two regions */ + struct vfio_region_info bar_region[2]; + int num_regions =3D 0; + int container_fd, dev_fd; + int r =3D vfio_setup(groupnr, rid_str, &bar_region[0], 2, &num_regions, + &container_fd, &dev_fd); + + FAIL_IF(r, "VFIO setup failed\n"); + FAIL_IF(!vfio_feature_present(dev_fd, VFIO_DEVICE_FEATURE_DMA_BUF), + "VFIO DMABUF support not available\n"); + + printf("-i- Container fd %d, device fd %d, and got DMA_BUF\n", + container_fd, dev_fd); + + setup_signals(); + + /////////////////////////////////////////////////////////////////////////= /////// + + /* Real basics: create DMABUF, and mmap it, and access MMIO through it. + * Do this for 2nd BAR if present, too (just plain memory). + */ + printf("\nTEST: Create DMABUF, map it\n"); + int bar_db_fd =3D vfio_create_dmabuf(dev_fd, /* region */ 0, + /* offset */ 0, bar_region[0].size); + FAIL_IF(bar_db_fd < 0, "Can't create DMABUF, %d\n", errno); + + volatile uint32_t *dbbar0 =3D + mmap_resource(bar_region[0].size, bar_db_fd, 0); + + printf("-i- Mapped DMABUF BAR0 at %p+0x%llx\n", dbbar0, + bar_region[0].size); + check_mmio(dbbar0); + + /* TEST: Map the traditional VFIO one _second_; it should still work. */ + printf("\nTEST: Map the regular VFIO BAR\n"); + volatile uint32_t *vfiobar =3D + mmap_resource(bar_region[0].size, dev_fd, bar_region[0].offset); + + printf("-i- Mapped VIRTIO BAR0 at %p+0x%llx\n", vfiobar, + bar_region[0].size); + check_mmio(vfiobar); + + /* Test plan: + * + * - Revoke the first DMABUF, check for fault + * - Check VFIO BAR access still works + * - Revoke first DMABUF fd again: -EBADFD + * - create new DMABUF for same (previously-revoked) region: accessible + * + * - Create overlapping DMABUFs: map success, maps alias OK + * - Create a second mapping of the second DMABUF, maps alias OK + * - Destroy one by revoking through a dup()ed fd: check mapping revoked + * - Check original is still accessible + * + * If we have a larger (>4K of accessible stuff!) second BAR resource: + * - Map it, create an overlapping alias with offset !=3D 0 + * - Check alias/offset is sane + * + * Last: + * - close container_fd and dev_fd: check DMABUF mapping revoked + * - try revoking that dmabuf_fd: -ENODEV + */ + + printf("\nTEST: Revocation of first DMABUF\n"); + r =3D ioctl(bar_db_fd, DMA_BUF_IOCTL_REVOKE); + FAIL_IF(r !=3D 0, "Can't revoke: %d\n", r); + + if (sigsetjmp(jmpbuf, 1) =3D=3D 0) { + // Try an access: expect BOOM + check_mmio(dbbar0); + FAIL_IF(true, "Expecting fault after revoke!\n"); + } + printf("-i- Revoked OK\n"); + + printf("\nTEST: Access through VFIO-mapped region still works\n"); + if (sigsetjmp(jmpbuf, 1) =3D=3D 0) + check_mmio(vfiobar); + else + FAIL_IF(true, "Expecting VFIO-mapped BAR to still work!\n"); + + printf("\nTEST: Double-revoke\n"); + r =3D ioctl(bar_db_fd, DMA_BUF_IOCTL_REVOKE); + FAIL_IF(r !=3D -1 || errno !=3D EBADFD, + "Expecting 2nd revoke to give EBADFD, got %d errno %d\n", r, + errno); + printf("-i- Correctly failed second revoke\n"); + + printf("\nTEST: Can't mmap() revoked DMABUF\n"); + void *dbfail =3D mmap(0, bar_region[1].size, PROT_READ | PROT_WRITE, + MAP_SHARED, bar_db_fd, 0); + FAIL_IF(dbfail !=3D MAP_FAILED, "mmap() should fail\n"); + printf("-i- OK\n"); + + printf("\nTEST: Recreate new DMABUF for previously-revoked region\n"); + int bar_db_fd_2 =3D vfio_create_dmabuf( + dev_fd, /* region */ 0, /* offset */ 0, bar_region[0].size); + FAIL_IF(bar_db_fd_2 < 0, "Can't create DMABUF, %d\n", errno); + + volatile uint32_t *dbbar0_2 =3D + mmap_resource(bar_region[0].size, bar_db_fd_2, 0); + + printf("-i- Mapped 2nd DMABUF BAR0 at %p+0x%llx\n", dbbar0_2, + bar_region[0].size); + check_mmio(dbbar0_2); + + munmap((void *)dbbar0, bar_region[0].size); + close(bar_db_fd); + + printf("\nTEST: Create aliasing/overlapping DMABUF\n"); + int bar_db_fd_3 =3D vfio_create_dmabuf( + dev_fd, /* region */ 0, /* offset */ 0, bar_region[0].size); + FAIL_IF(bar_db_fd_3 < 0, "Can't create DMABUF, %d\n", errno); + + volatile uint32_t *dbbar0_3 =3D + mmap_resource(bar_region[0].size, bar_db_fd_3, 0); + + printf("-i- Mapped 3rd DMABUF BAR0 at %p+0x%llx\n", dbbar0_3, + bar_region[0].size); + check_mmio(dbbar0_3); + + /* Basic aliasing check: Write value through 2nd, read back through 3rd */ + uint32_t v; + + dbbar0_2[EDU_REG_INVERT / 4] =3D 0xfacecace; + v =3D dbbar0_3[EDU_REG_INVERT / 4]; + FAIL_IF(v !=3D ~0xfacecace, + "Alias inverted MMIO value %08x bad (should be %08x)\n", v, + ~0xfacecace); + printf("-i- Aliasing DMABUF OK\n"); + + printf("\nTEST: Create a double-mapping of DMABUF\n"); + /* Create another mmap of the existing aliasing DMABUF fd */ + volatile uint32_t *dbbar0_3_2 =3D + mmap_resource(bar_region[0].size, bar_db_fd_3, 0); + + printf("-i- Mapped 3rd DMABUF BAR0 _again_ at %p+0x%llx\n", dbbar0_3_2, + bar_region[0].size); + /* Can we see the value we wrote before? */ + v =3D dbbar0_3_2[EDU_REG_INVERT / 4]; + FAIL_IF(v !=3D ~0xfacecace, + "Alias alias inverted MMIO value %08x bad (should be %08x)\n", + v, ~0xfacecace); + check_mmio(dbbar0_3_2); + + printf("\nTEST: revoke aliasing DMABUF through dup()ed fd\n"); + int dup_dbfd3 =3D dup(bar_db_fd_3); + + r =3D ioctl(dup_dbfd3, DMA_BUF_IOCTL_REVOKE); + FAIL_IF(r !=3D 0, "Can't revoke: %d\n", r); + + /* Both of the mmap()s made should now be gone */ + if (sigsetjmp(jmpbuf, 1) =3D=3D 0) { + check_mmio(dbbar0_3); + FAIL_IF(true, "Expecting fault on 1st mmap after revoke!\n"); + } + + if (sigsetjmp(jmpbuf, 1) =3D=3D 0) { + check_mmio(dbbar0_3_2); + FAIL_IF(true, "Expecting fault on 2nd mmap after revoke!\n"); + } + printf("-i- Both aliasing DMABUF mappings revoked OK\n"); + + close(dup_dbfd3); + close(bar_db_fd_3); + munmap((void *)dbbar0_3, bar_region[0].size); + munmap((void *)dbbar0_3_2, bar_region[0].size); + + /* And finally, although the aliasing DMABUF is gone, access + * through the original one should still work: + */ + if (sigsetjmp(jmpbuf, 1) =3D=3D 0) + check_mmio(dbbar0_2); + else + FAIL_IF(true, + "Expecting original DMABUF mapping to still work!\n"); + printf("-i- Aliasing DMABUF removal OK, original still accessible\n"); + + /* If we're attached to a hacked/extended QEMU EDU device with + * a large memory region 1 then we can test things like + * offsets/aliasing. + */ + if (num_regions >=3D 2) { + printf("\nTEST: Second BAR: test overlapping+offset DMABUF\n"); + + printf("-i- Region 1 DMABUF: offset %llx, size %llx\n", + bar_region[1].offset, bar_region[1].size); + int bar1_db_fd =3D + vfio_create_dmabuf(dev_fd, 1, 0, bar_region[1].size); + + FAIL_IF(bar1_db_fd < 0, "Can't create DMABUF, %d\n", errno); + + volatile uint32_t *dbbar1 =3D mmap_resource_aligned( + bar_region[1].size, MiB(32), bar1_db_fd, 0); + printf("-i- Mapped DMABUF Region 1 at %p+0x%llx\n", dbbar1, + bar_region[1].size); + + /* Init with known values */ + for (unsigned long i =3D 0; i < (bar_region[1].size); + i +=3D getpagesize()) + dbbar1[i / 4] =3D 0xca77face ^ i; + + v =3D dbbar1[0]; + FAIL_IF(v !=3D 0xca77face, + "DB Region 1 read: Magic value %08x incorrect\n", v); + printf("-i- DB Region 1 read: Magic: 0x%08x\n", v); + + /* TEST: Overlap/aliasing; map same BAR with a range + * offset > 0. Also test disjoint/multi-range DMABUFs + * by creating a second range. This appears as one + * contiguous VA range mapped to a first BAR range + * (starting from range0_offset), then skipping a few + * physical pages, then a second range (starting at + * range1_offset). + */ + unsigned long range0_offset =3D getpagesize() * 3; + unsigned long range1_skip_pages =3D 5; + unsigned long range1_skip =3D getpagesize() * range1_skip_pages; + unsigned long range_size =3D + (bar_region[1].size - range0_offset - range1_skip) / 2; + unsigned long range1_offset =3D + range0_offset + range_size + range1_skip; + unsigned long map_size =3D range_size * 2; + + printf("\nTEST: Second BAR aliasing mapping, two ranges size 0x%lx:\n\t\= t0x%lx-0x%lx, 0x%lx-0x%lx\n", + range_size, range0_offset, range0_offset + range_size, + range1_offset, range1_offset + range_size); + + int bar1_2_db_fd =3D vfio_create_dmabuf_dual( + dev_fd, 1, range0_offset, range_size, range1_offset, + range_size); + FAIL_IF(bar1_2_db_fd < 0, "Can't create DMABUF, %d\n", errno); + + volatile uint32_t *dbbar1_2 =3D + mmap_resource(map_size, bar1_2_db_fd, 0); + + printf("-i- Mapped DMABUF Region 1 alias at %p+0x%lx\n", + dbbar1_2, map_size); + FAIL_IF(dbbar1_2[0] !=3D dbbar1[range0_offset / 4], + "slice2 value mismatch\n"); + + dbbar1[(range0_offset + 4) / 4] =3D 0xfacef00d; + /* Check we can see the value written above at +offset + * from offset 0 of this mapping (since the DMABUF + * itself is offsetted): + */ + v =3D dbbar1_2[4 / 4]; + FAIL_IF(v !=3D 0xfacef00d, + "DB Region 1 alias read: Magic value %08x incorrect\n", + v); + printf("-i- DB Region 1 alias read: Magic 0x%08x, OK\n", v); + + /* Read back the known values across the two + * sub-ranges of the dbbar1_2 mapping, accounting for + * the physical pages skipped between them + */ + for (unsigned long i =3D 0; i < range_size; i +=3D getpagesize()) { + unsigned long t =3D i + range0_offset; + uint32_t want =3D (0xca77face ^ t); + + v =3D dbbar1_2[i / 4]; + FAIL_IF(v !=3D want, + "Expected %08x (got %08x) from range0 +%08lx (real %08lx)\n", + want, v, i, t); + } + for (unsigned long i =3D range_size; i < (range_size * 2); + i +=3D getpagesize()) { + unsigned long t =3D i + range1_offset - range_size; + uint32_t want =3D (0xca77face ^ t); + + v =3D dbbar1_2[i / 4]; + FAIL_IF(v !=3D want, + "Expected %08x (got %08x) from range1 +%08lx (real %08lx)\n", + want, v, i, t); + } + + printf("\nTEST: Third BAR aliasing mapping, testing mmap() non-zero offs= et:\n"); + + unsigned long smaller =3D range_size - 0x1000; + volatile uint32_t *dbbar1_3 =3D mmap_resource_aligned( + smaller, MiB(32), bar1_2_db_fd, range_size); + printf("-i- Mapped DMABUF Region 1 range 1 alias at %p+0x%lx\n", + dbbar1_3, smaller); + + for (unsigned long i =3D 0; i < smaller; i +=3D getpagesize()) { + unsigned long t =3D i + range1_offset; + uint32_t want =3D (0xca77face ^ t); + + v =3D dbbar1_3[i / 4]; + FAIL_IF(v !=3D want, + "Expected %08x (got %08x) from 3rd range1 +%08lx (real %08lx)\n", + want, v, i, t); + } + printf("-i- mmap offset OK\n"); + + /* TODO: If we can observe hugepages (mechanically, + * rather than human reading debug), we can test + * interesting alignment cases for the PFN search: + * + * - Deny hugepages at start/end of an mmap() that + * starts/ends at non-HP-aligned addresses + * (e.g. first pages are small, middle is fully + * aligned in VA and PFN so 2M, and buffer finishes + * before 2M boundary, so last pages are small). + * + * - Everything aligned nicely except the mmap() size + * is <2MB, so hugepage denied due to straddling + * end. + * + * - Buffer offsets into BAR not aligned, so no huge + * mappings even if mmap() is perfectly aligned. + */ + + /* Check that access after DMABUF fd close still works + * (VMA still holds refcount, obvs!) + */ + close(bar1_2_db_fd); + if (sigsetjmp(jmpbuf, 1) =3D=3D 0) + v =3D dbbar1_2[0x4 / 4]; + else + FAIL_IF(true, + "Expecting original DMABUF mapping to still work!\n"); + printf("-i- DB Region 1 alias read 2: Magic 0x%08x, OK\n", v); + printf("-i- Offset check OK\n"); + } + + printf("\nTEST: Shutdown: close VFIO container/device fds, check DMABUF g= one\n"); + + /* Closing all uses of dev_fd (including the VFIO BAR mmap()!) + * will revoke the DMABUF; even though the DMABUF fd might + * remain open, the mapping itself is zapped. Start with a + * plain close (before unmapping the VFIO BAR mapping): + */ + close(dev_fd); + close(container_fd); + printf("-i- VFIO fds closed\n"); + + if (sigsetjmp(jmpbuf, 1) =3D=3D 0) + check_mmio(dbbar0_2); + else + FAIL_IF(true, + "Expecting DMABUF mapping to still work if VFIO mapping still live!\n"); + + munmap((void *)vfiobar, bar_region[0].size); + printf("-i- VFIO BAR unmapped\n"); + + /* The final reference via VFIO should now be gone, and the + * DMABUF should now be destroyed. The mapping of it should + * be inaccessible: + */ + if (sigsetjmp(jmpbuf, 1) =3D=3D 0) { + check_mmio(dbbar0_2); + FAIL_IF(true, + "Expecting DMABUF mapping to fault after VFIO fd shutdown!\n"); + } + printf("-i- DMABUF mappings inaccessible\n"); + + /* Ensure we can't mmap() DMABUF for closed device */ + void *dbfail2 =3D mmap(0, bar_region[1].size, PROT_READ | PROT_WRITE, + MAP_SHARED, bar_db_fd_2, 0); + FAIL_IF(dbfail2 !=3D MAP_FAILED, "mmap() should fail\n"); + printf("-i- Can't mmap DMABUF for closed device, OK\n"); + + /* The DMABUF fd is still open though; try a revoke on it: */ + r =3D ioctl(bar_db_fd_2, DMA_BUF_IOCTL_REVOKE); + FAIL_IF(r !=3D -1 || errno !=3D ENODEV, + "Expecting revoke after shutdown to give ENODEV, got %d errno %d\n", + r, errno); + printf("-i- Correctly failed final revoke\n"); + + munmap((void *)dbbar0_2, bar_region[0].size); + close(bar_db_fd_2); + + printf("\nPASS\n"); + + return 0; +} + +static void usage(char *me) +{ + printf("Usage:\t%s -g -r \n" + "\n" + "\t\tGroup is found via device path, e.g. cat /sys/bus/pci/devices= /0000:03:1d.0/iommu_group\n" + "\t\tRID is of the form 0000:03:1d.0\n" + "\n", + me); +} + +int main(int argc, char *argv[]) +{ + /* Get args: IOMMU group and BDF/path */ + int groupnr =3D -1; + char *rid_str =3D NULL; + int arg; + + while ((arg =3D getopt(argc, argv, "g:r:h")) !=3D -1) { + switch (arg) { + case 'g': + groupnr =3D atoi(optarg); + break; + + case 'r': + rid_str =3D strdup(optarg); + break; + case 'h': + default: + usage(argv[0]); + return 1; + } + } + + if (rid_str =3D=3D NULL || groupnr =3D=3D -1) { + usage(argv[0]); + return 1; + } + + printf("-i- Using group number %d, RID '%s'\n", groupnr, rid_str); + + return vfio_dmabuf_test(groupnr, rid_str); +} --=20 2.47.3