From nobody Thu Dec 18 03:20:30 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 05D4521CFEE for ; Wed, 5 Feb 2025 23:18:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738797498; cv=none; b=BALYSAw8zqFQLgNrKAPtDYSCaDpjivpCTxt+6EhmOckKKDW9xBq8qjZjmf9/KJz9FnG9P39Id4giOVQ4EvmjyqWf+21WXTdrZEO9N4PBkQ5gWJzjQqHoO2tnGHqdBQzlTUJfcK6ezJTXua2K3JRZiR4I01d0jZiNqwUZCjk876o= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738797498; c=relaxed/simple; bh=c0FKae0datf842mFkxPTOtU/p5nszLRuh9nfgAkopfA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=fEK7qcSqX4mAf4/117QKJKkz0dc+qiwIeMX4I0P4XFi3/DY0TKv9Y6xb4l7TTM/vuZsP7Y502yfNx0cFAD6XSNMRB8gOy2m7mIptSxJSJmzwYXhme2J8XEJtwtu6rD2+KVCx4XTAbYbR1Vx7mLwKEttKLxMPYiSNlq0w5GD31f8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=RITY8emc; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="RITY8emc" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1738797494; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=VBKJl1czRBY86nXc1glbEuEYIO4hJO4j3CmfeALteSw=; b=RITY8emcNZD9IaYoQThOu2B8gR3rqD3ppIuNZ6TWAOY+0FC+ZWQc7Q2t2Ss0ju1Mfwkya0 xT0mja5dZm21cEf89ymXjghijrEyVRvA/B8YO+ODJXtNEgpThPTCZxA3aORQQaMR4QEC4n GiMTKXfokDhiO9U2ZbZzvXo3d0K0Q5M= Received: from mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-663-wEuLXHT-MzWAqIlLtDFQ3Q-1; Wed, 05 Feb 2025 18:18:11 -0500 X-MC-Unique: wEuLXHT-MzWAqIlLtDFQ3Q-1 X-Mimecast-MFC-AGG-ID: wEuLXHT-MzWAqIlLtDFQ3Q Received: from mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.93]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id B4B131800873; Wed, 5 Feb 2025 23:18:10 +0000 (UTC) Received: from omen.home.shazbot.org (unknown [10.22.81.141]) by mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 46B6B1800265; Wed, 5 Feb 2025 23:18:09 +0000 (UTC) From: Alex Williamson To: alex.williamson@redhat.com Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, peterx@redhat.com, mitchell.augustin@canonical.com, clg@redhat.com Subject: [PATCH 1/5] vfio/type1: Catch zero from pin_user_pages_remote() Date: Wed, 5 Feb 2025 16:17:17 -0700 Message-ID: <20250205231728.2527186-2-alex.williamson@redhat.com> In-Reply-To: <20250205231728.2527186-1-alex.williamson@redhat.com> References: <20250205231728.2527186-1-alex.williamson@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.93 Content-Type: text/plain; charset="utf-8" pin_user_pages_remote() can currently return zero for invalid args or zero nr_pages, neither of which should ever happen. However vaddr_get_pfns() indicates it should only ever return a positive value or -errno and there's a theoretical case where this can slip through and be unhandled by callers. Therefore convert zero to -EFAULT. Signed-off-by: Alex Williamson Reported-by: "Mitchell Augustin" Reviewed-by: "Mitchell Augustin" Reviewed-by: Jason Gunthorpe Reviewed-by: Peter Xu Tested-by: "Mitchell Augustin" --- drivers/vfio/vfio_iommu_type1.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type= 1.c index 50ebc9593c9d..119cf886d8c0 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -564,6 +564,8 @@ static int vaddr_get_pfns(struct mm_struct *mm, unsigne= d long vaddr, if (ret > 0) { *pfn =3D page_to_pfn(pages[0]); goto done; + } else if (!ret) { + ret =3D -EFAULT; } =20 vaddr =3D untagged_addr_remote(mm, vaddr); --=20 2.47.1 From nobody Thu Dec 18 03:20:30 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 05CE621CFEC for ; Wed, 5 Feb 2025 23:18:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738797499; cv=none; b=WMdIzHOjHZ/u8M43GvJN/XFAu/WXrOi7g1kkQZBTtclBPNPZhQNZI5J9HoZ1PRjp7OzXtVmMQRNTFQVxK3wQUlvgQ92JTQeLtcpotYwT8GUTiprFcgR6tSdJaUAeIedAoepn1T6mpYtJoN1aqp+d6Lzoa4A15WWCIqTVzlqbzSk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738797499; c=relaxed/simple; bh=4QmfjawvTW7/JuBzJw2fN4Til1n8dAtbherJt5RYjtQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=rsJ4H04XmpeJwPvSZWfHDv5KA7eFSjdDCiMtYbtcplXvWtC5iZ0RkWjPYtUj7wQM/F6jaChW7CcjmVsyJ9lZ1EdGeUsSPQJ4hzNkgXpknnVVVPl3Ipi8ROtKXckdkqgzJw30g/Dgl3juTonR2JUSsgk+Z5vYcKy3Fi2nnyHLpUQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=Dlaj/zjx; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="Dlaj/zjx" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1738797494; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=JL6vZNRbPmBlHb8nArQ9k01zTpT4aI26y7m8102a0EY=; b=Dlaj/zjxmfpdpnjhC0StMWqEtfDFDqJ+deqDB7EinvnvqyNrbFDPxBQTH54s6YRGV87ZmD b/P6e+XXyU3NVhSfIFH919BOvnX9EHkm8Xklxn9OfRSD3g4ObhQ077QqPdQsJhQlWHm2iQ +5IyK2bzVp+8HinRiMoHXBmdlupJ2MI= Received: from mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-615-MHVeKkGdMVO6MwCp5BIpSA-1; Wed, 05 Feb 2025 18:18:13 -0500 X-MC-Unique: MHVeKkGdMVO6MwCp5BIpSA-1 X-Mimecast-MFC-AGG-ID: MHVeKkGdMVO6MwCp5BIpSA Received: from mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.93]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 7D6201956094; Wed, 5 Feb 2025 23:18:12 +0000 (UTC) Received: from omen.home.shazbot.org (unknown [10.22.81.141]) by mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 16BB61800570; Wed, 5 Feb 2025 23:18:10 +0000 (UTC) From: Alex Williamson To: alex.williamson@redhat.com Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, peterx@redhat.com, mitchell.augustin@canonical.com, clg@redhat.com Subject: [PATCH 2/5] vfio/type1: Convert all vaddr_get_pfns() callers to use vfio_batch Date: Wed, 5 Feb 2025 16:17:18 -0700 Message-ID: <20250205231728.2527186-3-alex.williamson@redhat.com> In-Reply-To: <20250205231728.2527186-1-alex.williamson@redhat.com> References: <20250205231728.2527186-1-alex.williamson@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.93 Content-Type: text/plain; charset="utf-8" This is a step towards passing the structure to vaddr_get_pfns() directly in order to provide greater distinction between page backed pfns and pfnmaps. Signed-off-by: Alex Williamson Reported-by: "Mitchell Augustin" Reviewed-by: "Mitchell Augustin" Reviewed-by: Peter Xu Tested-by: "Mitchell Augustin" --- drivers/vfio/vfio_iommu_type1.c | 21 +++++++++++++++++---- 1 file changed, 17 insertions(+), 4 deletions(-) diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type= 1.c index 119cf886d8c0..2e95f5f4d881 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -471,12 +471,12 @@ static int put_pfn(unsigned long pfn, int prot) =20 #define VFIO_BATCH_MAX_CAPACITY (PAGE_SIZE / sizeof(struct page *)) =20 -static void vfio_batch_init(struct vfio_batch *batch) +static void __vfio_batch_init(struct vfio_batch *batch, bool single) { batch->size =3D 0; batch->offset =3D 0; =20 - if (unlikely(disable_hugepages)) + if (single || unlikely(disable_hugepages)) goto fallback; =20 batch->pages =3D (struct page **) __get_free_page(GFP_KERNEL); @@ -491,6 +491,16 @@ static void vfio_batch_init(struct vfio_batch *batch) batch->capacity =3D 1; } =20 +static void vfio_batch_init(struct vfio_batch *batch) +{ + __vfio_batch_init(batch, false); +} + +static void vfio_batch_init_single(struct vfio_batch *batch) +{ + __vfio_batch_init(batch, true); +} + static void vfio_batch_unpin(struct vfio_batch *batch, struct vfio_dma *dm= a) { while (batch->size) { @@ -730,7 +740,7 @@ static long vfio_unpin_pages_remote(struct vfio_dma *dm= a, dma_addr_t iova, static int vfio_pin_page_external(struct vfio_dma *dma, unsigned long vadd= r, unsigned long *pfn_base, bool do_accounting) { - struct page *pages[1]; + struct vfio_batch batch; struct mm_struct *mm; int ret; =20 @@ -738,7 +748,9 @@ static int vfio_pin_page_external(struct vfio_dma *dma,= unsigned long vaddr, if (!mmget_not_zero(mm)) return -ENODEV; =20 - ret =3D vaddr_get_pfns(mm, vaddr, 1, dma->prot, pfn_base, pages); + vfio_batch_init_single(&batch); + + ret =3D vaddr_get_pfns(mm, vaddr, 1, dma->prot, pfn_base, batch.pages); if (ret !=3D 1) goto out; =20 @@ -757,6 +769,7 @@ static int vfio_pin_page_external(struct vfio_dma *dma,= unsigned long vaddr, } =20 out: + vfio_batch_fini(&batch); mmput(mm); return ret; } --=20 2.47.1 From nobody Thu Dec 18 03:20:30 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 85F7F221D82 for ; Wed, 5 Feb 2025 23:18:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738797503; cv=none; b=d2c78f/Mpxce7hDiQ9pOaOAmnj76EffNlYp5tLOfobM5Q0/Bd6PJnm+nrF3yPQ5ArxSUm4bZ16RUFHUjotdnldsvvQgJZ3RbyGYfDeKjXYANirpYfV+Hxmnj5B23S/U87jNiXYRb/4+3dtxxnMRDJGtSaS0zr7C5rx1SeqtD8Uw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738797503; c=relaxed/simple; bh=EprKxhsvdDeHA1fgEvU9ySyUBwYUYyI/rOqpttHkhNM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=SfJN96msD3rH10gNdRDSPhpPKexQzILna9eyL9i6EHI3AguWPjl95q8+tM5QnlzZL+oPGJu7yDhy2AlNpJok63nAVNYiOHeQfZLkfrpDT7I0BrbNyXWFQhhGtBzitKpPN00qoG5sREayMSlyx6sQ5R502oFcl6kWdlXbIjnjrmE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=E1ematkf; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="E1ematkf" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1738797500; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=g2j42eQCCBuQmnjQswePvigH68JWZO03KTko7vdI09E=; b=E1ematkfeC0uDnNLlK86GoSAeEoPb0pJim5iB189+KaaQ6F3jY4uTf879DeVu+at92megd NyPJfSR+ieMttBWG19kkR1e7FKGBv0C0wn8WfAh5rL5pkq3eY7uLWUrfZnT/q7zg8a8+TP B8mGqQJ+MIQkaUllh32nEL75VGBqENk= Received: from mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-374-5dX5pGO4PoK0hz4TmypS8A-1; Wed, 05 Feb 2025 18:18:15 -0500 X-MC-Unique: 5dX5pGO4PoK0hz4TmypS8A-1 X-Mimecast-MFC-AGG-ID: 5dX5pGO4PoK0hz4TmypS8A Received: from mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.93]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 4B890180087B; Wed, 5 Feb 2025 23:18:14 +0000 (UTC) Received: from omen.home.shazbot.org (unknown [10.22.81.141]) by mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id B1DFE1800570; Wed, 5 Feb 2025 23:18:12 +0000 (UTC) From: Alex Williamson To: alex.williamson@redhat.com Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, peterx@redhat.com, mitchell.augustin@canonical.com, clg@redhat.com Subject: [PATCH 3/5] vfio/type1: Use vfio_batch for vaddr_get_pfns() Date: Wed, 5 Feb 2025 16:17:19 -0700 Message-ID: <20250205231728.2527186-4-alex.williamson@redhat.com> In-Reply-To: <20250205231728.2527186-1-alex.williamson@redhat.com> References: <20250205231728.2527186-1-alex.williamson@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.93 Content-Type: text/plain; charset="utf-8" Passing the vfio_batch to vaddr_get_pfns() allows for greater distinction between page backed pfns and pfnmaps. In the case of page backed pfns, vfio_batch.size is set to a positive value matching the number of pages filled in vfio_batch.pages. For a pfnmap, vfio_batch.size remains zero as vfio_batch.pages are not used. In both cases the return value continues to indicate the number of pfns and the provided pfn arg is set to the initial pfn value. This allows us to shortcut the pfnmap case, which is detected by the zero vfio_batch.size. pfnmaps do not contribute to locked memory accounting, therefore we can update counters and continue directly, which also enables a future where vaddr_get_pfns() can return a value greater than one for consecutive pfnmaps. NB. Now that we're not guessing whether the initial pfn is page backed or pfnmap, we no longer need to special case the put_pfn() and batch size reset. It's safe for vfio_batch_unpin() to handle this case. Signed-off-by: Alex Williamson Reported-by: "Mitchell Augustin" Reviewed-by: "Mitchell Augustin" Reviewed-by: Peter Xu Tested-by: "Mitchell Augustin" --- drivers/vfio/vfio_iommu_type1.c | 62 ++++++++++++++++++--------------- 1 file changed, 34 insertions(+), 28 deletions(-) diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type= 1.c index 2e95f5f4d881..939920454da7 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -555,12 +555,16 @@ static int follow_fault_pfn(struct vm_area_struct *vm= a, struct mm_struct *mm, =20 /* * Returns the positive number of pfns successfully obtained or a negative - * error code. + * error code. The initial pfn is stored in the pfn arg. For page-backed + * pfns, the provided batch is also updated to indicate the filled pages a= nd + * initial offset. For VM_PFNMAP pfns, only the returned number of pfns a= nd + * returned initial pfn are provided; subsequent pfns are contiguous. */ static int vaddr_get_pfns(struct mm_struct *mm, unsigned long vaddr, long npages, int prot, unsigned long *pfn, - struct page **pages) + struct vfio_batch *batch) { + long pin_pages =3D min_t(long, npages, batch->capacity); struct vm_area_struct *vma; unsigned int flags =3D 0; int ret; @@ -569,10 +573,12 @@ static int vaddr_get_pfns(struct mm_struct *mm, unsig= ned long vaddr, flags |=3D FOLL_WRITE; =20 mmap_read_lock(mm); - ret =3D pin_user_pages_remote(mm, vaddr, npages, flags | FOLL_LONGTERM, - pages, NULL); + ret =3D pin_user_pages_remote(mm, vaddr, pin_pages, flags | FOLL_LONGTERM, + batch->pages, NULL); if (ret > 0) { - *pfn =3D page_to_pfn(pages[0]); + *pfn =3D page_to_pfn(batch->pages[0]); + batch->size =3D ret; + batch->offset =3D 0; goto done; } else if (!ret) { ret =3D -EFAULT; @@ -628,32 +634,41 @@ static long vfio_pin_pages_remote(struct vfio_dma *dm= a, unsigned long vaddr, *pfn_base =3D 0; } =20 + if (unlikely(disable_hugepages)) + npage =3D 1; + while (npage) { if (!batch->size) { /* Empty batch, so refill it. */ - long req_pages =3D min_t(long, npage, batch->capacity); - - ret =3D vaddr_get_pfns(mm, vaddr, req_pages, dma->prot, - &pfn, batch->pages); + ret =3D vaddr_get_pfns(mm, vaddr, npage, dma->prot, + &pfn, batch); if (ret < 0) goto unpin_out; =20 - batch->size =3D ret; - batch->offset =3D 0; - if (!*pfn_base) { *pfn_base =3D pfn; rsvd =3D is_invalid_reserved_pfn(*pfn_base); } + + /* Handle pfnmap */ + if (!batch->size) { + if (pfn !=3D *pfn_base + pinned || !rsvd) + goto out; + + pinned +=3D ret; + npage -=3D ret; + vaddr +=3D (PAGE_SIZE * ret); + iova +=3D (PAGE_SIZE * ret); + continue; + } } =20 /* - * pfn is preset for the first iteration of this inner loop and - * updated at the end to handle a VM_PFNMAP pfn. In that case, - * batch->pages isn't valid (there's no struct page), so allow - * batch->pages to be touched only when there's more than one - * pfn to check, which guarantees the pfns are from a - * !VM_PFNMAP vma. + * pfn is preset for the first iteration of this inner loop due to the + * fact that vaddr_get_pfns() needs to provide the initial pfn for pfnma= ps. + * Therefore to reduce redundancy, the next pfn is fetched at the end of + * the loop. A PageReserved() page could still qualify as page backed a= nd + * rsvd here, and therefore continues to use the batch. */ while (true) { if (pfn !=3D *pfn_base + pinned || @@ -688,21 +703,12 @@ static long vfio_pin_pages_remote(struct vfio_dma *dm= a, unsigned long vaddr, =20 pfn =3D page_to_pfn(batch->pages[batch->offset]); } - - if (unlikely(disable_hugepages)) - break; } =20 out: ret =3D vfio_lock_acct(dma, lock_acct, false); =20 unpin_out: - if (batch->size =3D=3D 1 && !batch->offset) { - /* May be a VM_PFNMAP pfn, which the batch can't remember. */ - put_pfn(pfn, dma->prot); - batch->size =3D 0; - } - if (ret < 0) { if (pinned && !rsvd) { for (pfn =3D *pfn_base ; pinned ; pfn++, pinned--) @@ -750,7 +756,7 @@ static int vfio_pin_page_external(struct vfio_dma *dma,= unsigned long vaddr, =20 vfio_batch_init_single(&batch); =20 - ret =3D vaddr_get_pfns(mm, vaddr, 1, dma->prot, pfn_base, batch.pages); + ret =3D vaddr_get_pfns(mm, vaddr, 1, dma->prot, pfn_base, &batch); if (ret !=3D 1) goto out; =20 --=20 2.47.1 From nobody Thu Dec 18 03:20:30 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 812A12206A3 for ; Wed, 5 Feb 2025 23:18:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738797502; cv=none; b=Lo0zvaoLmttaIWYRVaz5EBeyrGp81ujvQFQ+18VGPrBzcmAkoKDijwEl0RFpDfmXEpXQiJ4UwlD9+y8Vv3vCx5VYQVCP9eHZvyOYOwnspayGctLB4xSt2cCgOyXADgJ3lrPA6WPSJQJu8uc0C6XJBVBCmygs6CJFFywOCXTVzr8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738797502; c=relaxed/simple; bh=mkgLg5sls7QullkSGR+WNjErlAYFR3INSBiq+3o4Oks=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Ye3fGwN+EPir8r9a/LiJAsyDhoWetYioM5d4lGog/dD9KsGTJSCBdB6+gUVJSjrAJNP3W/sfCYzLFBw4xbg+KUUZ20XF4tTvUPaDomKWngzK8lnigCHDRpXFRfKtT1hO6TuCDPVAOhDzE0d+IzSlvxzb7vpaZFcxaH2cx6x0cro= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=X5staEXE; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="X5staEXE" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1738797499; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=8W4JP+Fju0ACXLthBx+3G3GONe8TLTlsBTTA1Q3RDZE=; b=X5staEXEqrX61B15Tr1jf2Alpb26PaqsVJhx8LblmABWoNrZnircOSCERJfMUxCkAF6KqQ eGjWCsHjOsCSC0sFCZFHZpGGWQbzjjL8mrSHabBdmQsXXUmtqyeeMdPiogrCsEP8i014x/ ovHxwrke5njPtpkzMoHdBHPnTGghPw8= Received: from mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-561-FkNDRqmMM9yE_UzeBVYLPQ-1; Wed, 05 Feb 2025 18:18:17 -0500 X-MC-Unique: FkNDRqmMM9yE_UzeBVYLPQ-1 X-Mimecast-MFC-AGG-ID: FkNDRqmMM9yE_UzeBVYLPQ Received: from mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.93]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 871FF1800985; Wed, 5 Feb 2025 23:18:16 +0000 (UTC) Received: from omen.home.shazbot.org (unknown [10.22.81.141]) by mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 9D68A1800570; Wed, 5 Feb 2025 23:18:14 +0000 (UTC) From: Alex Williamson To: alex.williamson@redhat.com Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, peterx@redhat.com, mitchell.augustin@canonical.com, clg@redhat.com, akpm@linux-foundation.org, linux-mm@kvack.org Subject: [PATCH 4/5] mm: Provide page mask in struct follow_pfnmap_args Date: Wed, 5 Feb 2025 16:17:20 -0700 Message-ID: <20250205231728.2527186-5-alex.williamson@redhat.com> In-Reply-To: <20250205231728.2527186-1-alex.williamson@redhat.com> References: <20250205231728.2527186-1-alex.williamson@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.93 Content-Type: text/plain; charset="utf-8" follow_pfnmap_start() walks the page table for a given address and fills out the struct follow_pfnmap_args in pfnmap_args_setup(). The page mask of the page table level is already provided to this latter function for calculating the pfn. This page mask can also be useful for the caller to determine the extent of the contiguous mapping. For example, vfio-pci now supports huge_fault for pfnmaps and is able to insert pud and pmd mappings. When we DMA map these pfnmaps, ex. PCI MMIO BARs, we iterate follow_pfnmap_start() to get each pfn to test for a contiguous pfn range. Providing the mapping page mask allows us to skip the extent of the mapping level. Assuming a 1GB pud level and 4KB page size, iterations are reduced by a factor of 256K. In wall clock time, mapping a 32GB PCI BAR is reduced from ~1s to <1ms. Cc: Andrew Morton Cc: linux-mm@kvack.org Signed-off-by: Alex Williamson Reported-by: "Mitchell Augustin" Reviewed-by: "Mitchell Augustin" Reviewed-by: Jason Gunthorpe Reviewed-by: Peter Xu Tested-by: "Mitchell Augustin" --- include/linux/mm.h | 2 ++ mm/memory.c | 1 + 2 files changed, 3 insertions(+) diff --git a/include/linux/mm.h b/include/linux/mm.h index b1c3db9cf355..0ef7e7a0b4eb 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2416,11 +2416,13 @@ struct follow_pfnmap_args { * Outputs: * * @pfn: the PFN of the address + * @pgmask: page mask covering pfn * @pgprot: the pgprot_t of the mapping * @writable: whether the mapping is writable * @special: whether the mapping is a special mapping (real PFN maps) */ unsigned long pfn; + unsigned long pgmask; pgprot_t pgprot; bool writable; bool special; diff --git a/mm/memory.c b/mm/memory.c index 398c031be9ba..97ccd43761b2 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -6388,6 +6388,7 @@ static inline void pfnmap_args_setup(struct follow_pf= nmap_args *args, args->lock =3D lock; args->ptep =3D ptep; args->pfn =3D pfn_base + ((args->address & ~addr_mask) >> PAGE_SHIFT); + args->pgmask =3D addr_mask; args->pgprot =3D pgprot; args->writable =3D writable; args->special =3D special; --=20 2.47.1 From nobody Thu Dec 18 03:20:30 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 24A7622256F for ; Wed, 5 Feb 2025 23:18:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738797505; cv=none; b=sA0bUDPiQ5qrF+W+JOVuL4WI9DLXstauJjDBHaA8cCGzvsU9p09FESNVp3vkAt5J36YMcBE2XgB8WbCQkcOjXReKrCR9FGYbUunt+lzxKEuquL5e8L6Ha1yP3/BQuYOQn/po7tzRws5gq7dwgCCoYQtIUoUazeJwa7zvd2qTylQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738797505; c=relaxed/simple; bh=fsYmWYGZlPnMx8rPd45rFmtH5cOXmj1gG2W6neNXrNw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=WGadOEG7aqFaOgRiZ+armNATKrSltDwZX+tObWWJ3iP6evVgqdjhzv4Q/2InJgN3mTDVUweb/ZTatfGg5Q4A3+UYtQ/WGxHv8PNoNNaC9lB5qSvxxm3KcocFjqZe2qSkNydp+n/+b2jsK7Q3A1RDQ76/L6t1kLafSNm2kcGrNtc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=VA41LJHX; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="VA41LJHX" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1738797503; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=FZa80BhZdx9gmyyMrwsj4JyFeb2ASk0SBEuM3S7wkWc=; b=VA41LJHXtOLrxc5rqjO+kFP3/UHo6s/Q+9Q/M+PgRSVy0G/lS4MAILOPxAz6XqnQszD1tU U4OqxFSRN/xZEIyWmv0FVIaJ7mM3SV8HlhLQcs4TU1tq0KYzu6Rx4QAiEzeLKKeOKswd/N KWg4d3N45sXwPCgtHZ50hsh8Oo1QW7U= Received: from mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-62-TONjRLI-McyooT1dQwr4iQ-1; Wed, 05 Feb 2025 18:18:20 -0500 X-MC-Unique: TONjRLI-McyooT1dQwr4iQ-1 X-Mimecast-MFC-AGG-ID: TONjRLI-McyooT1dQwr4iQ Received: from mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.93]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id B0063180087D; Wed, 5 Feb 2025 23:18:18 +0000 (UTC) Received: from omen.home.shazbot.org (unknown [10.22.81.141]) by mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id DC67C1800570; Wed, 5 Feb 2025 23:18:16 +0000 (UTC) From: Alex Williamson To: alex.williamson@redhat.com Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, peterx@redhat.com, mitchell.augustin@canonical.com, clg@redhat.com, akpm@linux-foundation.org, linux-mm@kvack.org Subject: [PATCH 5/5] vfio/type1: Use mapping page mask for pfnmaps Date: Wed, 5 Feb 2025 16:17:21 -0700 Message-ID: <20250205231728.2527186-6-alex.williamson@redhat.com> In-Reply-To: <20250205231728.2527186-1-alex.williamson@redhat.com> References: <20250205231728.2527186-1-alex.williamson@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.93 Content-Type: text/plain; charset="utf-8" vfio-pci supports huge_fault for PCI MMIO BARs and will insert pud and pmd mappings for well aligned mappings. follow_pfnmap_start() walks the page table and therefore knows the page mask of the level where the address is found and returns this through follow_pfnmap_args.pgmask. Subsequent pfns from this address until the end of the mapping page are necessarily consecutive. Use this information to retrieve a range of pfnmap pfns in a single pass. With optimal mappings and alignment on systems with 1GB pud and 4KB page size, this reduces iterations for DMA mapping PCI BARs by a factor of 256K. In real world testing, the overhead of iterating pfns for a VM DMA mapping a 32GB PCI BAR is reduced from ~1s to sub-millisecond overhead. Signed-off-by: Alex Williamson Reported-by: "Mitchell Augustin" Reviewed-by: "Mitchell Augustin" Reviewed-by: Peter Xu Tested-by: "Mitchell Augustin" --- drivers/vfio/vfio_iommu_type1.c | 24 +++++++++++++++++------- 1 file changed, 17 insertions(+), 7 deletions(-) diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type= 1.c index 939920454da7..6f3e8d981311 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -520,7 +520,7 @@ static void vfio_batch_fini(struct vfio_batch *batch) =20 static int follow_fault_pfn(struct vm_area_struct *vma, struct mm_struct *= mm, unsigned long vaddr, unsigned long *pfn, - bool write_fault) + unsigned long *pgmask, bool write_fault) { struct follow_pfnmap_args args =3D { .vma =3D vma, .address =3D vaddr }; int ret; @@ -544,10 +544,12 @@ static int follow_fault_pfn(struct vm_area_struct *vm= a, struct mm_struct *mm, return ret; } =20 - if (write_fault && !args.writable) + if (write_fault && !args.writable) { ret =3D -EFAULT; - else + } else { *pfn =3D args.pfn; + *pgmask =3D args.pgmask; + } =20 follow_pfnmap_end(&args); return ret; @@ -590,15 +592,23 @@ static int vaddr_get_pfns(struct mm_struct *mm, unsig= ned long vaddr, vma =3D vma_lookup(mm, vaddr); =20 if (vma && vma->vm_flags & VM_PFNMAP) { - ret =3D follow_fault_pfn(vma, mm, vaddr, pfn, prot & IOMMU_WRITE); + unsigned long pgmask; + + ret =3D follow_fault_pfn(vma, mm, vaddr, pfn, &pgmask, + prot & IOMMU_WRITE); if (ret =3D=3D -EAGAIN) goto retry; =20 if (!ret) { - if (is_invalid_reserved_pfn(*pfn)) - ret =3D 1; - else + if (is_invalid_reserved_pfn(*pfn)) { + unsigned long epfn; + + epfn =3D (((*pfn << PAGE_SHIFT) + ~pgmask + 1) + & pgmask) >> PAGE_SHIFT; + ret =3D min_t(int, npages, epfn - *pfn); + } else { ret =3D -EFAULT; + } } } done: --=20 2.47.1