From nobody Mon May 6 05:14:57 2024 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1F88EC433EF for ; Wed, 22 Jun 2022 02:28:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230016AbiFVC2D (ORCPT ); Tue, 21 Jun 2022 22:28:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46468 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1355471AbiFVC14 (ORCPT ); Tue, 21 Jun 2022 22:27:56 -0400 Received: from out2-smtp.messagingengine.com (out2-smtp.messagingengine.com [66.111.4.26]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5D589233; Tue, 21 Jun 2022 19:27:52 -0700 (PDT) Received: from compute5.internal (compute5.nyi.internal [10.202.2.45]) by mailout.nyi.internal (Postfix) with ESMTP id 3190B5C00E2; Tue, 21 Jun 2022 22:27:48 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute5.internal (MEProxy); Tue, 21 Jun 2022 22:27:48 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= invisiblethingslab.com; h=cc:cc:content-transfer-encoding:date :date:from:from:in-reply-to:message-id:mime-version:reply-to :sender:subject:subject:to:to; s=fm2; t=1655864868; x= 1655951268; bh=GcU2tFZp/9iFH/MJo6UjIxw9Kv+4l3HrP6DKPjWQj3w=; b=a l0Emb6cXDROwzCXUToGQnhRpoR1/J6dwLUF3/5kSptwCsV24PLJHfHf4noNGsoMD JwtgENxeuX6dDMf6GnxJ7bpfEog/4/nDObMTHVZ6ZxgnlwpCESdeoCQNiExqEPPQ UQWnpUmGenU/sVjMRC5pTFQtw7elwbJNxgU1PsaO5Ce6QmTVl1NIMBKckri3YIXR OMO0qSupcgbb5CbzxmoM3UNOwAyZUyfGwwLsKPakFb6ldWgdb7VQ0ORTQMcdM3Ql gVgVtAZzxdTIJxMKZkq14N91q7/ZxjzBBTfRch4iLaFJzCj40FHaOLevOvPC4zBi JVKKWbUmIOCcqBFEH4+NA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding:date:date :feedback-id:feedback-id:from:from:in-reply-to:message-id :mime-version:reply-to:sender:subject:subject:to:to:x-me-proxy :x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm2; t= 1655864868; x=1655951268; bh=GcU2tFZp/9iFH/MJo6UjIxw9Kv+4l3HrP6D KPjWQj3w=; b=feJyVyQ1s+P9ZzcklTIilzR38/QFuvFFfj64g2AKSCgB4pFB2bn A+JeCCPEaHVj2YhxKlEmB7tep0LiPX44gqnAJ1/+UB1tc3/lY59Vp2Aje9aU688u Q/EgHFkMMIGZB442wEKYsC9qZqRktQZ0ISoTz9rnnimLoSVW6azwhBzLcJgUcGIB nH5a0wMYQu4XsC7m9Ebv9peS8QujYT5YTvqVRcrxP1DK0DSgCJVRhGKGAdHS9pCi BZPsmHqnbtKls+kNTj5XsXdATI2Nt8grY+wJlS8valcIx/VO6zdgawV+eVtz4gMb JzZciDk+eqWeJ4iiYvuOD22jbsGinqsp4Tg== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvfedrudefgedgieduucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhephffvvefufffkofgggfestdekredtredttdenucfhrhhomhepffgvmhhiucfo rghrihgvucfqsggvnhhouhhruceouggvmhhisehinhhvihhsihgslhgvthhhihhnghhslh grsgdrtghomheqnecuggftrfgrthhtvghrnhepieejgedufeeukeeijedukefgueekvdeg iedtudefhfdtffehffeuveefvdfglefgnecuffhomhgrihhnpehgihhthhhusgdrtghomh enucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpeguvghm ihesihhnvhhishhisghlvghthhhinhhgshhlrggsrdgtohhm X-ME-Proxy: Feedback-ID: iac594737:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue, 21 Jun 2022 22:27:46 -0400 (EDT) From: Demi Marie Obenour To: Juergen Gross , Stefano Stabellini , Oleksandr Tyshchenko , David Vrabel , Jennifer Herbert Cc: Demi Marie Obenour , xen-devel@lists.xenproject.org, linux-kernel@vger.kernel.org, stable@vger.kernel.org Subject: [PATCH v4] xen/gntdev: Avoid blocking in unmap_grant_pages() Date: Tue, 21 Jun 2022 22:27:26 -0400 Message-Id: <20220622022726.2538-1-demi@invisiblethingslab.com> X-Mailer: git-send-email 2.35.3 MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" unmap_grant_pages() currently waits for the pages to no longer be used. In https://github.com/QubesOS/qubes-issues/issues/7481, this lead to a deadlock against i915: i915 was waiting for gntdev's MMU notifier to finish, while gntdev was waiting for i915 to free its pages. I also believe this is responsible for various deadlocks I have experienced in the past. Avoid these problems by making unmap_grant_pages async. This requires making it return void, as any errors will not be available when the function returns. Fortunately, the only use of the return value is a WARN_ON(), which can be replaced by a WARN_ON when the error is detected. Additionally, a failed call will not prevent further calls from being made, but this is harmless. Because unmap_grant_pages is now async, the grant handle will be sent to INVALID_GRANT_HANDLE too late to prevent multiple unmaps of the same handle. Instead, a separate bool array is allocated for this purpose. This wastes memory, but stuffing this information in padding bytes is too fragile. Furthermore, it is necessary to grab a reference to the map before making the asynchronous call, and release the reference when the call returns. It is also necessary to guard against reentrancy in gntdev_map_put(), and to handle the case where userspace tries to map a mapping whose contents have not all been freed yet. Fixes: 745282256c75 ("xen/gntdev: safely unmap grants in case they are stil= l in use") Cc: stable@vger.kernel.org Signed-off-by: Demi Marie Obenour Reviewed-by: Juergen Gross --- drivers/xen/gntdev-common.h | 7 ++ drivers/xen/gntdev.c | 157 ++++++++++++++++++++++++------------ 2 files changed, 113 insertions(+), 51 deletions(-) diff --git a/drivers/xen/gntdev-common.h b/drivers/xen/gntdev-common.h index 20d7d059dadb..40ef379c28ab 100644 --- a/drivers/xen/gntdev-common.h +++ b/drivers/xen/gntdev-common.h @@ -16,6 +16,7 @@ #include #include #include +#include =20 struct gntdev_dmabuf_priv; =20 @@ -56,6 +57,7 @@ struct gntdev_grant_map { struct gnttab_unmap_grant_ref *unmap_ops; struct gnttab_map_grant_ref *kmap_ops; struct gnttab_unmap_grant_ref *kunmap_ops; + bool *being_removed; struct page **pages; unsigned long pages_vm_start; =20 @@ -73,6 +75,11 @@ struct gntdev_grant_map { /* Needed to avoid allocation in gnttab_dma_free_pages(). */ xen_pfn_t *frames; #endif + + /* Number of live grants */ + atomic_t live_grants; + /* Needed to avoid allocation in __unmap_grant_pages */ + struct gntab_unmap_queue_data unmap_data; }; =20 struct gntdev_grant_map *gntdev_alloc_map(struct gntdev_priv *priv, int co= unt, diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c index 59ffea800079..4b56c39f766d 100644 --- a/drivers/xen/gntdev.c +++ b/drivers/xen/gntdev.c @@ -35,6 +35,7 @@ #include #include #include +#include =20 #include #include @@ -60,10 +61,11 @@ module_param(limit, uint, 0644); MODULE_PARM_DESC(limit, "Maximum number of grants that may be mapped by one mapping request"); =20 +/* True in PV mode, false otherwise */ static int use_ptemod; =20 -static int unmap_grant_pages(struct gntdev_grant_map *map, - int offset, int pages); +static void unmap_grant_pages(struct gntdev_grant_map *map, + int offset, int pages); =20 static struct miscdevice gntdev_miscdev; =20 @@ -120,6 +122,7 @@ static void gntdev_free_map(struct gntdev_grant_map *ma= p) kvfree(map->unmap_ops); kvfree(map->kmap_ops); kvfree(map->kunmap_ops); + kvfree(map->being_removed); kfree(map); } =20 @@ -140,10 +143,13 @@ struct gntdev_grant_map *gntdev_alloc_map(struct gntd= ev_priv *priv, int count, add->unmap_ops =3D kvmalloc_array(count, sizeof(add->unmap_ops[0]), GFP_KERNEL); add->pages =3D kvcalloc(count, sizeof(add->pages[0]), GFP_KERNEL); + add->being_removed =3D + kvcalloc(count, sizeof(add->being_removed[0]), GFP_KERNEL); if (NULL =3D=3D add->grants || NULL =3D=3D add->map_ops || NULL =3D=3D add->unmap_ops || - NULL =3D=3D add->pages) + NULL =3D=3D add->pages || + NULL =3D=3D add->being_removed) goto err; if (use_ptemod) { add->kmap_ops =3D kvmalloc_array(count, sizeof(add->kmap_ops[0]), @@ -250,9 +256,36 @@ void gntdev_put_map(struct gntdev_priv *priv, struct g= ntdev_grant_map *map) if (!refcount_dec_and_test(&map->users)) return; =20 - if (map->pages && !use_ptemod) + if (map->pages && !use_ptemod) { + /* + * Increment the reference count. This ensures that the + * subsequent call to unmap_grant_pages() will not wind up + * re-entering itself. It *can* wind up calling + * gntdev_put_map() recursively, but such calls will be with a + * reference count greater than 1, so they will return before + * this code is reached. The recursion depth is thus limited to + * 1. Do NOT use refcount_inc() here, as it will detect that + * the reference count is zero and WARN(). + */ + refcount_set(&map->users, 1); + + /* + * Unmap the grants. This may or may not be asynchronous, so it + * is possible that the reference count is 1 on return, but it + * could also be greater than 1. + */ unmap_grant_pages(map, 0, map->count); =20 + /* Check if the memory now needs to be freed */ + if (!refcount_dec_and_test(&map->users)) + return; + + /* + * All pages have been returned to the hypervisor, so free the + * map. + */ + } + if (map->notify.flags & UNMAP_NOTIFY_SEND_EVENT) { notify_remote_via_evtchn(map->notify.event); evtchn_put(map->notify.event); @@ -283,6 +316,7 @@ static int find_grant_ptes(pte_t *pte, unsigned long ad= dr, void *data) =20 int gntdev_map_grant_pages(struct gntdev_grant_map *map) { + size_t alloced =3D 0; int i, err =3D 0; =20 if (!use_ptemod) { @@ -331,97 +365,116 @@ int gntdev_map_grant_pages(struct gntdev_grant_map *= map) map->count); =20 for (i =3D 0; i < map->count; i++) { - if (map->map_ops[i].status =3D=3D GNTST_okay) + if (map->map_ops[i].status =3D=3D GNTST_okay) { map->unmap_ops[i].handle =3D map->map_ops[i].handle; - else if (!err) + if (!use_ptemod) + alloced++; + } else if (!err) err =3D -EINVAL; =20 if (map->flags & GNTMAP_device_map) map->unmap_ops[i].dev_bus_addr =3D map->map_ops[i].dev_bus_addr; =20 if (use_ptemod) { - if (map->kmap_ops[i].status =3D=3D GNTST_okay) + if (map->kmap_ops[i].status =3D=3D GNTST_okay) { + if (map->map_ops[i].status =3D=3D GNTST_okay) + alloced++; map->kunmap_ops[i].handle =3D map->kmap_ops[i].handle; - else if (!err) + } else if (!err) err =3D -EINVAL; } } + atomic_add(alloced, &map->live_grants); return err; } =20 -static int __unmap_grant_pages(struct gntdev_grant_map *map, int offset, - int pages) +static void __unmap_grant_pages_done(int result, + struct gntab_unmap_queue_data *data) { - int i, err =3D 0; - struct gntab_unmap_queue_data unmap_data; - - if (map->notify.flags & UNMAP_NOTIFY_CLEAR_BYTE) { - int pgno =3D (map->notify.addr >> PAGE_SHIFT); - if (pgno >=3D offset && pgno < offset + pages) { - /* No need for kmap, pages are in lowmem */ - uint8_t *tmp =3D pfn_to_kaddr(page_to_pfn(map->pages[pgno])); - tmp[map->notify.addr & (PAGE_SIZE-1)] =3D 0; - map->notify.flags &=3D ~UNMAP_NOTIFY_CLEAR_BYTE; - } - } - - unmap_data.unmap_ops =3D map->unmap_ops + offset; - unmap_data.kunmap_ops =3D use_ptemod ? map->kunmap_ops + offset : NULL; - unmap_data.pages =3D map->pages + offset; - unmap_data.count =3D pages; - - err =3D gnttab_unmap_refs_sync(&unmap_data); - if (err) - return err; + unsigned int i; + struct gntdev_grant_map *map =3D data->data; + unsigned int offset =3D data->unmap_ops - map->unmap_ops; =20 - for (i =3D 0; i < pages; i++) { - if (map->unmap_ops[offset+i].status) - err =3D -EINVAL; + for (i =3D 0; i < data->count; i++) { + WARN_ON(map->unmap_ops[offset+i].status); pr_debug("unmap handle=3D%d st=3D%d\n", map->unmap_ops[offset+i].handle, map->unmap_ops[offset+i].status); map->unmap_ops[offset+i].handle =3D INVALID_GRANT_HANDLE; if (use_ptemod) { - if (map->kunmap_ops[offset+i].status) - err =3D -EINVAL; + WARN_ON(map->kunmap_ops[offset+i].status); pr_debug("kunmap handle=3D%u st=3D%d\n", map->kunmap_ops[offset+i].handle, map->kunmap_ops[offset+i].status); map->kunmap_ops[offset+i].handle =3D INVALID_GRANT_HANDLE; } } - return err; + /* + * Decrease the live-grant counter. This must happen after the loop to + * prevent premature reuse of the grants by gnttab_mmap(). + */ + atomic_sub(data->count, &map->live_grants); + + /* Release reference taken by __unmap_grant_pages */ + gntdev_put_map(NULL, map); +} + +static void __unmap_grant_pages(struct gntdev_grant_map *map, int offset, + int pages) +{ + if (map->notify.flags & UNMAP_NOTIFY_CLEAR_BYTE) { + int pgno =3D (map->notify.addr >> PAGE_SHIFT); + + if (pgno >=3D offset && pgno < offset + pages) { + /* No need for kmap, pages are in lowmem */ + uint8_t *tmp =3D pfn_to_kaddr(page_to_pfn(map->pages[pgno])); + + tmp[map->notify.addr & (PAGE_SIZE-1)] =3D 0; + map->notify.flags &=3D ~UNMAP_NOTIFY_CLEAR_BYTE; + } + } + + map->unmap_data.unmap_ops =3D map->unmap_ops + offset; + map->unmap_data.kunmap_ops =3D use_ptemod ? map->kunmap_ops + offset : NU= LL; + map->unmap_data.pages =3D map->pages + offset; + map->unmap_data.count =3D pages; + map->unmap_data.done =3D __unmap_grant_pages_done; + map->unmap_data.data =3D map; + refcount_inc(&map->users); /* to keep map alive during async call below */ + + gnttab_unmap_refs_async(&map->unmap_data); } =20 -static int unmap_grant_pages(struct gntdev_grant_map *map, int offset, - int pages) +static void unmap_grant_pages(struct gntdev_grant_map *map, int offset, + int pages) { - int range, err =3D 0; + int range; + + if (atomic_read(&map->live_grants) =3D=3D 0) + return; /* Nothing to do */ =20 pr_debug("unmap %d+%d [%d+%d]\n", map->index, map->count, offset, pages); =20 /* It is possible the requested range will have a "hole" where we * already unmapped some of the grants. Only unmap valid ranges. */ - while (pages && !err) { - while (pages && - map->unmap_ops[offset].handle =3D=3D INVALID_GRANT_HANDLE) { + while (pages) { + while (pages && map->being_removed[offset]) { offset++; pages--; } range =3D 0; while (range < pages) { - if (map->unmap_ops[offset + range].handle =3D=3D - INVALID_GRANT_HANDLE) + if (map->being_removed[offset + range]) break; + map->being_removed[offset + range] =3D true; range++; } - err =3D __unmap_grant_pages(map, offset, range); + if (range) + __unmap_grant_pages(map, offset, range); offset +=3D range; pages -=3D range; } - - return err; } =20 /* ------------------------------------------------------------------ */ @@ -473,7 +526,6 @@ static bool gntdev_invalidate(struct mmu_interval_notif= ier *mn, struct gntdev_grant_map *map =3D container_of(mn, struct gntdev_grant_map, notifier); unsigned long mstart, mend; - int err; =20 if (!mmu_notifier_range_blockable(range)) return false; @@ -494,10 +546,9 @@ static bool gntdev_invalidate(struct mmu_interval_noti= fier *mn, map->index, map->count, map->vma->vm_start, map->vma->vm_end, range->start, range->end, mstart, mend); - err =3D unmap_grant_pages(map, + unmap_grant_pages(map, (mstart - map->vma->vm_start) >> PAGE_SHIFT, (mend - mstart) >> PAGE_SHIFT); - WARN_ON(err); =20 return true; } @@ -985,6 +1036,10 @@ static int gntdev_mmap(struct file *flip, struct vm_a= rea_struct *vma) goto unlock_out; if (use_ptemod && map->vma) goto unlock_out; + if (atomic_read(&map->live_grants)) { + err =3D -EAGAIN; + goto unlock_out; + } refcount_inc(&map->users); =20 vma->vm_ops =3D &gntdev_vmops; --=20 2.36.1