[BUG report] Deadlock in xen-blkfront upon device hot-unplug

Vitaly Kuznetsov posted 1 patch 2 years, 8 months ago
Failed in applying to current master (apply log)
[BUG report] Deadlock in xen-blkfront upon device hot-unplug
Posted by Vitaly Kuznetsov 2 years, 8 months ago
I'm observing a deadlock every time I try to unplug a xen-blkfront
device. With 5.14-rc1+ the deadlock looks like:

[  513.682405] ============================================
[  513.686617] WARNING: possible recursive locking detected
[  513.691020] 5.14.0-rc1+ #370 Not tainted
[  513.694272] --------------------------------------------
[  513.698528] xenwatch/144 is trying to acquire lock:
[  513.702424] ffff96dc4a4c1d28 (&disk->open_mutex){+.+.}-{3:3}, at: del_gendisk+0x53/0x210
[  513.708768] 
               but task is already holding lock:
[  513.713320] ffff96dc4a4c1d28 (&disk->open_mutex){+.+.}-{3:3}, at: blkback_changed+0x118/0xeb9 [xen_blkfront]
[  513.720369] 
               other info that might help us debug this:
[  513.724901]  Possible unsafe locking scenario:

[  513.729241]        CPU0
[  513.731326]        ----
[  513.733404]   lock(&disk->open_mutex);
[  513.736679]   lock(&disk->open_mutex);
[  513.739988] 
                *** DEADLOCK ***

[  513.745524]  May be due to missing lock nesting notation

[  513.751438] 2 locks held by xenwatch/144:
[  513.755344]  #0: ffffffff8c9f3c70 (xenwatch_mutex){+.+.}-{3:3}, at: xenwatch_thread+0xe6/0x190
[  513.762137]  #1: ffff96dc4a4c1d28 (&disk->open_mutex){+.+.}-{3:3}, at: blkback_changed+0x118/0xeb9 [xen_blkfront]
[  513.770381] 
               stack backtrace:
[  513.774785] CPU: 1 PID: 144 Comm: xenwatch Not tainted 5.14.0-rc1+ #370
[  513.780131] Hardware name: Xen HVM domU, BIOS 4.2.amazon 08/24/2006
[  513.785097] Call Trace:
[  513.787920]  dump_stack_lvl+0x6a/0x9a
[  513.791223]  __lock_acquire.cold+0x14a/0x2ba
[  513.794918]  ? mark_held_locks+0x50/0x80
[  513.798453]  lock_acquire+0xd3/0x2f0
[  513.801819]  ? del_gendisk+0x53/0x210
[  513.805334]  ? kernfs_put.part.0+0xe8/0x1b0
[  513.808905]  ? del_gendisk+0x53/0x210
[  513.812230]  __mutex_lock+0x8d/0x8c0
[  513.815415]  ? del_gendisk+0x53/0x210
[  513.818931]  ? kernfs_put.part.0+0xe8/0x1b0
[  513.822594]  del_gendisk+0x53/0x210
[  513.825782]  xlvbd_release_gendisk+0x6f/0xb0 [xen_blkfront]
[  513.830186]  blkback_changed+0x20e/0xeb9 [xen_blkfront]
[  513.834458]  ? xenbus_read_driver_state+0x39/0x60
[  513.838540]  xenwatch_thread+0x94/0x190
[  513.841936]  ? do_wait_intr_irq+0xb0/0xb0
[  513.845451]  ? xenbus_dev_request_and_reply+0x90/0x90
[  513.849885]  kthread+0x149/0x170
[  513.853039]  ? set_kthread_struct+0x40/0x40
[  513.857027]  ret_from_fork+0x22/0x30

My suspicion is that the problem was introduced by:

commit c76f48eb5c084b1e15c931ae8cc1826cd771d70d
Author: Christoph Hellwig <hch@lst.de>
Date:   Tue Apr 6 08:22:56 2021 +0200

    block: take bd_mutex around delete_partitions in del_gendisk

blkfront_closing() takes '&bdev->bd_disk->open_mutex' around
xlvbd_release_gendisk() call which in its turn calls del_gendisk() which
after the above mentioned commit tries to take the same mutex. I may be
completely wrong though.

If I try to avoid taking the mutex from blkfront_closing(): 

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 8d49f8fa98bb..9af6831492d4 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -2145,8 +2145,6 @@ static void blkfront_closing(struct blkfront_info *info)
                return;
        }
 
-       mutex_lock(&bdev->bd_disk->open_mutex);
-
        if (bdev->bd_openers) {
                xenbus_dev_error(xbdev, -EBUSY,
                                 "Device in use; refusing to close");
@@ -2156,7 +2154,6 @@ static void blkfront_closing(struct blkfront_info *info)
                xenbus_frontend_closed(xbdev);
        }
 
-       mutex_unlock(&bdev->bd_disk->open_mutex);
        bdput(bdev);
 }
 
the situation becomes even worse:

[   74.371465] general protection fault, probably for non-canonical address 0xb0fa8ce8ee8a2234: 0000 [#1] SMP PTI
[   74.381294] CPU: 3 PID: 144 Comm: xenwatch Not tainted 5.14.0-rc1+ #370
[   74.386172] Hardware name: Xen HVM domU, BIOS 4.2.amazon 08/24/2006
[   74.390918] RIP: 0010:del_timer+0x1f/0x80
[   74.394282] Code: 71 af a3 00 eb c1 31 c0 c3 66 90 0f 1f 44 00 00 41 55 41 54 45 31 e4 55 48 83 ec 10 65 48 8b 04 25 28 00 00 00 48 89 44 24 08 <48> 8b 47 08 48 85 c0 74 2d 48 89 e6 48 89 fd e8 dd e8 ff ff 48 89
[   74.407591] RSP: 0018:ffffbab68423bcc8 EFLAGS: 00010082
[   74.411691] RAX: dd931e09aefb8f00 RBX: b0fa8ce8ee8a21dc RCX: 0000000000005e7f
[   74.417041] RDX: 0000000000005e80 RSI: 0000000000000001 RDI: b0fa8ce8ee8a222c
[   74.422425] RBP: ffffbab68423bd20 R08: 0000000000000001 R09: 0000000000000001
[   74.427595] R10: 0000000000000001 R11: 0000000000000002 R12: 0000000000000000
[   74.432886] R13: ffffa0484f3e4000 R14: 0000000000000000 R15: ffffa0484691c000
[   74.438784] FS:  0000000000000000(0000) GS:ffffa083c8e00000(0000) knlGS:0000000000000000
[   74.444592] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   74.448917] CR2: 00007ff618903ff8 CR3: 0000000111e16001 CR4: 00000000001706e0
[   74.454309] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   74.460128] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   74.465872] Call Trace:
[   74.468467]  try_to_grab_pending+0x13f/0x2e0
[   74.472202]  cancel_delayed_work+0x2e/0xd0
[   74.475980]  blk_mq_stop_hw_queues+0x2d/0x50
[   74.479732]  blkfront_remove+0x40/0x210 [xen_blkfront]
[   74.484154]  xenbus_dev_remove+0x6d/0xf0
[   74.487872]  __device_release_driver+0x180/0x240
[   74.491561]  device_release_driver+0x26/0x40
[   74.497134]  bus_remove_device+0xef/0x160
[   74.500180]  device_del+0x18c/0x3e0
[   74.503451]  ? xenbus_probe_devices+0x120/0x120
[   74.506975]  ? klist_iter_exit+0x14/0x20
[   74.511649]  device_unregister+0x13/0x60
[   74.515237]  xenbus_dev_changed+0x174/0x1e0
[   74.518923]  xenwatch_thread+0x94/0x190
[   74.522208]  ? do_wait_intr_irq+0xb0/0xb0
[   74.525690]  ? xenbus_dev_request_and_reply+0x90/0x90
[   74.529973]  kthread+0x149/0x170
[   74.533007]  ? set_kthread_struct+0x40/0x40
[   74.537023]  ret_from_fork+0x22/0x30
[   74.540411] Modules linked in: vfat fat i2c_piix4 xfs libcrc32c crct10dif_pclmul crc32_pclmul crc32c_intel xen_blkfront ghash_clmulni_intel ena
[   74.549144] ---[ end trace 296bd6f709c05e9e ]---

At this point I can only say that something is certainly
wrong. Apologies if this is an already known problem.

-- 
Vitaly


Re: [BUG report] Deadlock in xen-blkfront upon device hot-unplug
Posted by Christoph Hellwig 2 years, 8 months ago
On Thu, Jul 15, 2021 at 11:16:30AM +0200, Vitaly Kuznetsov wrote:
> I'm observing a deadlock every time I try to unplug a xen-blkfront
> device. With 5.14-rc1+ the deadlock looks like:

I did actually stumble over this a few days ago just from code
inspection.  Below is what I come up with, can you give it a spin?

---
From 4fef3b917af51153dd99a958ee9064f0de3a8b6a Mon Sep 17 00:00:00 2001
From: Christoph Hellwig <hch@lst.de>
Date: Tue, 29 Jun 2021 13:49:22 +0200
Subject: xen-blkfront: sanitize the removal state machine

xen-blkfront has a weird protocol where close message from the remote
side can be delayed, and where hot removals are treated somewhat
differently from regular removals, all leading to potential NULL
pointer removals.  Fix this by just performing normal hot removals
even when the device is opened like all other Linux block drivers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/block/xen-blkfront.c | 182 +++--------------------------------
 1 file changed, 16 insertions(+), 166 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 8d49f8fa98bb..0ba37c97aabd 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -502,34 +502,21 @@ static int blkif_getgeo(struct block_device *bd, struct hd_geometry *hg)
 static int blkif_ioctl(struct block_device *bdev, fmode_t mode,
 		       unsigned command, unsigned long argument)
 {
-	struct blkfront_info *info = bdev->bd_disk->private_data;
 	int i;
 
-	dev_dbg(&info->xbdev->dev, "command: 0x%x, argument: 0x%lx\n",
-		command, (long)argument);
-
 	switch (command) {
 	case CDROMMULTISESSION:
-		dev_dbg(&info->xbdev->dev, "FIXME: support multisession CDs later\n");
 		for (i = 0; i < sizeof(struct cdrom_multisession); i++)
 			if (put_user(0, (char __user *)(argument + i)))
 				return -EFAULT;
 		return 0;
-
-	case CDROM_GET_CAPABILITY: {
-		struct gendisk *gd = info->gd;
-		if (gd->flags & GENHD_FL_CD)
+	case CDROM_GET_CAPABILITY:
+		if (bdev->bd_disk->flags & GENHD_FL_CD)
 			return 0;
 		return -EINVAL;
-	}
-
 	default:
-		/*printk(KERN_ALERT "ioctl %08x not supported by Xen blkdev\n",
-		  command);*/
-		return -EINVAL; /* same return as native Linux */
+		return -EINVAL;
 	}
-
-	return 0;
 }
 
 static unsigned long blkif_ring_get_request(struct blkfront_ring_info *rinfo,
@@ -1179,11 +1166,8 @@ static int xlvbd_alloc_gendisk(blkif_sector_t capacity,
 
 static void xlvbd_release_gendisk(struct blkfront_info *info)
 {
-	unsigned int minor, nr_minors, i;
 	struct blkfront_ring_info *rinfo;
-
-	if (info->rq == NULL)
-		return;
+	unsigned int i;
 
 	/* No more blkif_request(). */
 	blk_mq_stop_hw_queues(info->rq);
@@ -1198,12 +1182,8 @@ static void xlvbd_release_gendisk(struct blkfront_info *info)
 
 	del_gendisk(info->gd);
 
-	minor = info->gd->first_minor;
-	nr_minors = info->gd->minors;
-	xlbd_release_minors(minor, nr_minors);
-
+	xlbd_release_minors(info->gd->first_minor, info->gd->minors);
 	blk_cleanup_disk(info->gd);
-	info->gd = NULL;
 	blk_mq_free_tag_set(&info->tag_set);
 }
 
@@ -2126,38 +2106,16 @@ static int blkfront_resume(struct xenbus_device *dev)
 static void blkfront_closing(struct blkfront_info *info)
 {
 	struct xenbus_device *xbdev = info->xbdev;
-	struct block_device *bdev = NULL;
-
-	mutex_lock(&info->mutex);
-
-	if (xbdev->state == XenbusStateClosing) {
-		mutex_unlock(&info->mutex);
-		return;
-	}
 
-	if (info->gd)
-		bdev = bdgrab(info->gd->part0);
-
-	mutex_unlock(&info->mutex);
-
-	if (!bdev) {
-		xenbus_frontend_closed(xbdev);
+	if (xbdev->state == XenbusStateClosing)
 		return;
-	}
 
-	mutex_lock(&bdev->bd_disk->open_mutex);
-
-	if (bdev->bd_openers) {
-		xenbus_dev_error(xbdev, -EBUSY,
-				 "Device in use; refusing to close");
-		xenbus_switch_state(xbdev, XenbusStateClosing);
-	} else {
+	mutex_lock(&blkfront_mutex);
+	if (info->gd->flags & GENHD_FL_UP)
 		xlvbd_release_gendisk(info);
-		xenbus_frontend_closed(xbdev);
-	}
-
-	mutex_unlock(&bdev->bd_disk->open_mutex);
-	bdput(bdev);
+	mutex_unlock(&blkfront_mutex);
+	
+	xenbus_frontend_closed(xbdev);
 }
 
 static void blkfront_setup_discard(struct blkfront_info *info)
@@ -2472,8 +2430,7 @@ static void blkback_changed(struct xenbus_device *dev,
 			break;
 		fallthrough;
 	case XenbusStateClosing:
-		if (info)
-			blkfront_closing(info);
+		blkfront_closing(info);
 		break;
 	}
 }
@@ -2481,55 +2438,16 @@ static void blkback_changed(struct xenbus_device *dev,
 static int blkfront_remove(struct xenbus_device *xbdev)
 {
 	struct blkfront_info *info = dev_get_drvdata(&xbdev->dev);
-	struct block_device *bdev = NULL;
-	struct gendisk *disk;
 
 	dev_dbg(&xbdev->dev, "%s removed", xbdev->nodename);
 
-	if (!info)
-		return 0;
-
 	blkif_free(info, 0);
 
-	mutex_lock(&info->mutex);
-
-	disk = info->gd;
-	if (disk)
-		bdev = bdgrab(disk->part0);
-
-	info->xbdev = NULL;
-	mutex_unlock(&info->mutex);
-
-	if (!bdev) {
-		mutex_lock(&blkfront_mutex);
-		free_info(info);
-		mutex_unlock(&blkfront_mutex);
-		return 0;
-	}
-
-	/*
-	 * The xbdev was removed before we reached the Closed
-	 * state. See if it's safe to remove the disk. If the bdev
-	 * isn't closed yet, we let release take care of it.
-	 */
-
-	mutex_lock(&disk->open_mutex);
-	info = disk->private_data;
-
-	dev_warn(disk_to_dev(disk),
-		 "%s was hot-unplugged, %d stale handles\n",
-		 xbdev->nodename, bdev->bd_openers);
-
-	if (info && !bdev->bd_openers) {
+	mutex_lock(&blkfront_mutex);
+	if (info->gd->flags & GENHD_FL_UP)
 		xlvbd_release_gendisk(info);
-		disk->private_data = NULL;
-		mutex_lock(&blkfront_mutex);
-		free_info(info);
-		mutex_unlock(&blkfront_mutex);
-	}
-
-	mutex_unlock(&disk->open_mutex);
-	bdput(bdev);
+	free_info(info);
+	mutex_unlock(&blkfront_mutex);
 
 	return 0;
 }
@@ -2541,77 +2459,9 @@ static int blkfront_is_ready(struct xenbus_device *dev)
 	return info->is_ready && info->xbdev;
 }
 
-static int blkif_open(struct block_device *bdev, fmode_t mode)
-{
-	struct gendisk *disk = bdev->bd_disk;
-	struct blkfront_info *info;
-	int err = 0;
-
-	mutex_lock(&blkfront_mutex);
-
-	info = disk->private_data;
-	if (!info) {
-		/* xbdev gone */
-		err = -ERESTARTSYS;
-		goto out;
-	}
-
-	mutex_lock(&info->mutex);
-
-	if (!info->gd)
-		/* xbdev is closed */
-		err = -ERESTARTSYS;
-
-	mutex_unlock(&info->mutex);
-
-out:
-	mutex_unlock(&blkfront_mutex);
-	return err;
-}
-
-static void blkif_release(struct gendisk *disk, fmode_t mode)
-{
-	struct blkfront_info *info = disk->private_data;
-	struct xenbus_device *xbdev;
-
-	mutex_lock(&blkfront_mutex);
-	if (disk->part0->bd_openers)
-		goto out_mutex;
-
-	/*
-	 * Check if we have been instructed to close. We will have
-	 * deferred this request, because the bdev was still open.
-	 */
-
-	mutex_lock(&info->mutex);
-	xbdev = info->xbdev;
-
-	if (xbdev && xbdev->state == XenbusStateClosing) {
-		/* pending switch to state closed */
-		dev_info(disk_to_dev(disk), "releasing disk\n");
-		xlvbd_release_gendisk(info);
-		xenbus_frontend_closed(info->xbdev);
- 	}
-
-	mutex_unlock(&info->mutex);
-
-	if (!xbdev) {
-		/* sudden device removal */
-		dev_info(disk_to_dev(disk), "releasing disk\n");
-		xlvbd_release_gendisk(info);
-		disk->private_data = NULL;
-		free_info(info);
-	}
-
-out_mutex:
-	mutex_unlock(&blkfront_mutex);
-}
-
 static const struct block_device_operations xlvbd_block_fops =
 {
 	.owner = THIS_MODULE,
-	.open = blkif_open,
-	.release = blkif_release,
 	.getgeo = blkif_getgeo,
 	.ioctl = blkif_ioctl,
 	.compat_ioctl = blkdev_compat_ptr_ioctl,
-- 
2.30.2


Re: [BUG report] Deadlock in xen-blkfront upon device hot-unplug
Posted by Vitaly Kuznetsov 2 years, 8 months ago
Christoph Hellwig <hch@lst.de> writes:

> On Thu, Jul 15, 2021 at 11:16:30AM +0200, Vitaly Kuznetsov wrote:
>> I'm observing a deadlock every time I try to unplug a xen-blkfront
>> device. With 5.14-rc1+ the deadlock looks like:
>
> I did actually stumble over this a few days ago just from code
> inspection.  Below is what I come up with, can you give it a spin?

This eliminates the deadlock, thanks! Unfortunately, this reveals the
same issue I observed when I just dropped taking the mutex from
blkfront_closing():

[   66.455122] general protection fault, probably for non-canonical address 0xf1af5e354e6da159: 0000 [#1] SMP PTI
[   66.462802] CPU: 4 PID: 145 Comm: xenwatch Not tainted 5.14.0-rc1+ #370
[   66.467486] Hardware name: Xen HVM domU, BIOS 4.2.amazon 08/24/2006
[   66.472817] RIP: 0010:del_timer+0x1f/0x80
[   66.476570] Code: 71 af a3 00 eb c1 31 c0 c3 66 90 0f 1f 44 00 00 41 55 41 54 45 31 e4 55 48 83 ec 10 65 48 8b 04 25 28 00 00 00 48 89 44 24 08 <48> 8b 47 08 48 85 c0 74 2d 48 89 e6 48 89 fd e8 dd e8 ff ff 48 89
[   66.493967] RSP: 0018:ffffb5c10426bcd8 EFLAGS: 00010086
[   66.499416] RAX: a49b3c9544841100 RBX: f1af5e354e6da101 RCX: 0000000000005ebf
[   66.506386] RDX: 0000000000005ec0 RSI: 0000000000000001 RDI: f1af5e354e6da151
[   66.512799] RBP: ffffb5c10426bd30 R08: 0000000000000001 R09: 0000000000000001
[   66.518372] R10: 0000000000000001 R11: 0000000000000002 R12: 0000000000000000
[   66.523681] R13: ffff9aba8df63e40 R14: 0000000000000000 R15: ffff9aba86f40000
[   66.529365] FS:  0000000000000000(0000) GS:ffff9af609200000(0000) knlGS:0000000000000000
[   66.536187] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   66.540806] CR2: 00007ff024600130 CR3: 000000010117a006 CR4: 00000000001706e0
[   66.546345] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   66.552322] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   66.558501] Call Trace:
[   66.561449]  try_to_grab_pending+0x13f/0x2e0
[   66.565658]  cancel_delayed_work+0x2e/0xd0
[   66.570012]  blk_mq_stop_hw_queues+0x2d/0x50
[   66.574110]  blkfront_remove+0x30/0x130 [xen_blkfront]
[   66.579049]  xenbus_dev_remove+0x6d/0xf0
[   66.582473]  __device_release_driver+0x180/0x240
[   66.586963]  device_release_driver+0x26/0x40
[   66.591050]  bus_remove_device+0xef/0x160
[   66.594805]  device_del+0x18c/0x3e0
[   66.598570]  ? xenbus_probe_devices+0x120/0x120
[   66.602987]  ? klist_iter_exit+0x14/0x20
[   66.606915]  device_unregister+0x13/0x60
[   66.611135]  xenbus_dev_changed+0x174/0x1e0
[   66.615104]  xenwatch_thread+0x94/0x190
[   66.619028]  ? do_wait_intr_irq+0xb0/0xb0
[   66.623052]  ? xenbus_dev_request_and_reply+0x90/0x90
[   66.628218]  kthread+0x149/0x170
[   66.631509]  ? set_kthread_struct+0x40/0x40
[   66.635355]  ret_from_fork+0x22/0x30
[   66.639162] Modules linked in: vfat fat i2c_piix4 xfs libcrc32c crct10dif_pclmul crc32_pclmul crc32c_intel xen_blkfront ghash_clmulni_intel ena
[   66.650868] ---[ end trace 7fa9da1e39697767 ]---
[   66.655490] RIP: 0010:del_timer+0x1f/0x80
[   66.659813] Code: 71 af a3 00 eb c1 31 c0 c3 66 90 0f 1f 44 00 00 41 55 41 54 45 31 e4 55 48 83 ec 10 65 48 8b 04 25 28 00 00 00 48 89 44 24 08 <48> 8b 47 08 48 85 c0 74 2d 48 89 e6 48 89 fd e8 dd e8 ff ff 48 89
[   66.681045] RSP: 0018:ffffb5c10426bcd8 EFLAGS: 00010086
[   66.685888] RAX: a49b3c9544841100 RBX: f1af5e354e6da101 RCX: 0000000000005ebf
[   66.692153] RDX: 0000000000005ec0 RSI: 0000000000000001 RDI: f1af5e354e6da151
[   66.698778] RBP: ffffb5c10426bd30 R08: 0000000000000001 R09: 0000000000000001
[   66.705175] R10: 0000000000000001 R11: 0000000000000002 R12: 0000000000000000
[   66.711696] R13: ffff9aba8df63e40 R14: 0000000000000000 R15: ffff9aba86f40000
[   66.718035] FS:  0000000000000000(0000) GS:ffff9af609200000(0000) knlGS:0000000000000000
[   66.725210] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   66.730291] CR2: 00007ff024600130 CR3: 000000010117a006 CR4: 00000000001706e0
[   66.736235] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   66.742373] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   66.749026] BUG: sleeping function called from invalid context at include/linux/percpu-rwsem.h:49
[   66.756118] in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 145, name: xenwatch
[   66.763473] INFO: lockdep is turned off.
[   66.767428] irq event stamp: 24256
[   66.770900] hardirqs last  enabled at (24255): [<ffffffff90c32aeb>] _raw_spin_unlock_irqrestore+0x4b/0x5d
[   66.779620] hardirqs last disabled at (24256): [<ffffffff900fe21c>] try_to_grab_pending+0x15c/0x2e0
[   66.787763] softirqs last  enabled at (24196): [<ffffffff900e0b11>] __irq_exit_rcu+0xe1/0x100
[   66.794519] softirqs last disabled at (24191): [<ffffffff900e0b11>] __irq_exit_rcu+0xe1/0x100
[   66.801953] CPU: 4 PID: 145 Comm: xenwatch Tainted: G      D           5.14.0-rc1+ #370
[   66.809315] Hardware name: Xen HVM domU, BIOS 4.2.amazon 08/24/2006
[   66.814924] Call Trace:
[   66.817461]  dump_stack_lvl+0x6a/0x9a
[   66.821171]  ___might_sleep.cold+0xb6/0xc6
[   66.825436]  exit_signals+0x1c/0x2d0
[   66.829328]  do_exit+0xc7/0xbb0
[   66.832364]  ? kthread+0x149/0x170
[   66.835039]  rewind_stack_do_exit+0x17/0x20
[   66.838710] RIP: 0000:0x0
[   66.841671] Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6.
[   66.847271] RSP: 0000:0000000000000000 EFLAGS: 00000000 ORIG_RAX: 0000000000000000
[   66.854271] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[   66.860672] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[   66.866687] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[   66.872966] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[   66.878410] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000

-- 
Vitaly


Re: [BUG report] Deadlock in xen-blkfront upon device hot-unplug
Posted by Christoph Hellwig 2 years, 8 months ago
On Thu, Jul 15, 2021 at 03:17:37PM +0200, Vitaly Kuznetsov wrote:
> Christoph Hellwig <hch@lst.de> writes:
> 
> > On Thu, Jul 15, 2021 at 11:16:30AM +0200, Vitaly Kuznetsov wrote:
> >> I'm observing a deadlock every time I try to unplug a xen-blkfront
> >> device. With 5.14-rc1+ the deadlock looks like:
> >
> > I did actually stumble over this a few days ago just from code
> > inspection.  Below is what I come up with, can you give it a spin?
> 
> This eliminates the deadlock, thanks! Unfortunately, this reveals the
> same issue I observed when I just dropped taking the mutex from
> blkfront_closing():

Yeah, this still left too much cruft in blkfront_closing.  Can you
try this version instead?

---
From 4d926a44f15d2051909132182345754df23cc13d Mon Sep 17 00:00:00 2001
From: Christoph Hellwig <hch@lst.de>
Date: Tue, 29 Jun 2021 13:49:22 +0200
Subject: xen-blkfront: sanitize the removal state machine

xen-blkfront has a weird protocol where close message from the remote
side can be delayed, and where hot removals are treated somewhat
differently from regular removals, all leading to potential NULL
pointer removals.  Fix this by just performing normal hot removals
even when the device is opened like all other Linux block drivers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/block/xen-blkfront.c | 208 ++++-------------------------------
 1 file changed, 24 insertions(+), 184 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 8d49f8fa98bb..09ceb7bd585a 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -502,34 +502,21 @@ static int blkif_getgeo(struct block_device *bd, struct hd_geometry *hg)
 static int blkif_ioctl(struct block_device *bdev, fmode_t mode,
 		       unsigned command, unsigned long argument)
 {
-	struct blkfront_info *info = bdev->bd_disk->private_data;
 	int i;
 
-	dev_dbg(&info->xbdev->dev, "command: 0x%x, argument: 0x%lx\n",
-		command, (long)argument);
-
 	switch (command) {
 	case CDROMMULTISESSION:
-		dev_dbg(&info->xbdev->dev, "FIXME: support multisession CDs later\n");
 		for (i = 0; i < sizeof(struct cdrom_multisession); i++)
 			if (put_user(0, (char __user *)(argument + i)))
 				return -EFAULT;
 		return 0;
-
-	case CDROM_GET_CAPABILITY: {
-		struct gendisk *gd = info->gd;
-		if (gd->flags & GENHD_FL_CD)
+	case CDROM_GET_CAPABILITY:
+		if (bdev->bd_disk->flags & GENHD_FL_CD)
 			return 0;
 		return -EINVAL;
-	}
-
 	default:
-		/*printk(KERN_ALERT "ioctl %08x not supported by Xen blkdev\n",
-		  command);*/
-		return -EINVAL; /* same return as native Linux */
+		return -EINVAL;
 	}
-
-	return 0;
 }
 
 static unsigned long blkif_ring_get_request(struct blkfront_ring_info *rinfo,
@@ -1177,36 +1164,6 @@ static int xlvbd_alloc_gendisk(blkif_sector_t capacity,
 	return err;
 }
 
-static void xlvbd_release_gendisk(struct blkfront_info *info)
-{
-	unsigned int minor, nr_minors, i;
-	struct blkfront_ring_info *rinfo;
-
-	if (info->rq == NULL)
-		return;
-
-	/* No more blkif_request(). */
-	blk_mq_stop_hw_queues(info->rq);
-
-	for_each_rinfo(info, rinfo, i) {
-		/* No more gnttab callback work. */
-		gnttab_cancel_free_callback(&rinfo->callback);
-
-		/* Flush gnttab callback work. Must be done with no locks held. */
-		flush_work(&rinfo->work);
-	}
-
-	del_gendisk(info->gd);
-
-	minor = info->gd->first_minor;
-	nr_minors = info->gd->minors;
-	xlbd_release_minors(minor, nr_minors);
-
-	blk_cleanup_disk(info->gd);
-	info->gd = NULL;
-	blk_mq_free_tag_set(&info->tag_set);
-}
-
 /* Already hold rinfo->ring_lock. */
 static inline void kick_pending_request_queues_locked(struct blkfront_ring_info *rinfo)
 {
@@ -2126,38 +2083,26 @@ static int blkfront_resume(struct xenbus_device *dev)
 static void blkfront_closing(struct blkfront_info *info)
 {
 	struct xenbus_device *xbdev = info->xbdev;
-	struct block_device *bdev = NULL;
-
-	mutex_lock(&info->mutex);
+	struct blkfront_ring_info *rinfo;
+	unsigned int i;
 
-	if (xbdev->state == XenbusStateClosing) {
-		mutex_unlock(&info->mutex);
+	if (xbdev->state == XenbusStateClosing)
 		return;
-	}
-
-	if (info->gd)
-		bdev = bdgrab(info->gd->part0);
 
-	mutex_unlock(&info->mutex);
-
-	if (!bdev) {
-		xenbus_frontend_closed(xbdev);
-		return;
-	}
+	/* No more blkif_request(). */
+	blk_mq_stop_hw_queues(info->rq);
+	blk_set_queue_dying(info->rq);
+	set_capacity(info->gd, 0);
 
-	mutex_lock(&bdev->bd_disk->open_mutex);
+	for_each_rinfo(info, rinfo, i) {
+		/* No more gnttab callback work. */
+		gnttab_cancel_free_callback(&rinfo->callback);
 
-	if (bdev->bd_openers) {
-		xenbus_dev_error(xbdev, -EBUSY,
-				 "Device in use; refusing to close");
-		xenbus_switch_state(xbdev, XenbusStateClosing);
-	} else {
-		xlvbd_release_gendisk(info);
-		xenbus_frontend_closed(xbdev);
+		/* Flush gnttab callback work. Must be done with no locks held. */
+		flush_work(&rinfo->work);
 	}
 
-	mutex_unlock(&bdev->bd_disk->open_mutex);
-	bdput(bdev);
+	xenbus_frontend_closed(xbdev);
 }
 
 static void blkfront_setup_discard(struct blkfront_info *info)
@@ -2472,8 +2417,7 @@ static void blkback_changed(struct xenbus_device *dev,
 			break;
 		fallthrough;
 	case XenbusStateClosing:
-		if (info)
-			blkfront_closing(info);
+		blkfront_closing(info);
 		break;
 	}
 }
@@ -2481,55 +2425,19 @@ static void blkback_changed(struct xenbus_device *dev,
 static int blkfront_remove(struct xenbus_device *xbdev)
 {
 	struct blkfront_info *info = dev_get_drvdata(&xbdev->dev);
-	struct block_device *bdev = NULL;
-	struct gendisk *disk;
 
 	dev_dbg(&xbdev->dev, "%s removed", xbdev->nodename);
 
-	if (!info)
-		return 0;
+	del_gendisk(info->gd);
 
 	blkif_free(info, 0);
+	xlbd_release_minors(info->gd->first_minor, info->gd->minors);
+	blk_cleanup_disk(info->gd);
+	blk_mq_free_tag_set(&info->tag_set);
 
-	mutex_lock(&info->mutex);
-
-	disk = info->gd;
-	if (disk)
-		bdev = bdgrab(disk->part0);
-
-	info->xbdev = NULL;
-	mutex_unlock(&info->mutex);
-
-	if (!bdev) {
-		mutex_lock(&blkfront_mutex);
-		free_info(info);
-		mutex_unlock(&blkfront_mutex);
-		return 0;
-	}
-
-	/*
-	 * The xbdev was removed before we reached the Closed
-	 * state. See if it's safe to remove the disk. If the bdev
-	 * isn't closed yet, we let release take care of it.
-	 */
-
-	mutex_lock(&disk->open_mutex);
-	info = disk->private_data;
-
-	dev_warn(disk_to_dev(disk),
-		 "%s was hot-unplugged, %d stale handles\n",
-		 xbdev->nodename, bdev->bd_openers);
-
-	if (info && !bdev->bd_openers) {
-		xlvbd_release_gendisk(info);
-		disk->private_data = NULL;
-		mutex_lock(&blkfront_mutex);
-		free_info(info);
-		mutex_unlock(&blkfront_mutex);
-	}
-
-	mutex_unlock(&disk->open_mutex);
-	bdput(bdev);
+	mutex_lock(&blkfront_mutex);
+	free_info(info);
+	mutex_unlock(&blkfront_mutex);
 
 	return 0;
 }
@@ -2541,77 +2449,9 @@ static int blkfront_is_ready(struct xenbus_device *dev)
 	return info->is_ready && info->xbdev;
 }
 
-static int blkif_open(struct block_device *bdev, fmode_t mode)
-{
-	struct gendisk *disk = bdev->bd_disk;
-	struct blkfront_info *info;
-	int err = 0;
-
-	mutex_lock(&blkfront_mutex);
-
-	info = disk->private_data;
-	if (!info) {
-		/* xbdev gone */
-		err = -ERESTARTSYS;
-		goto out;
-	}
-
-	mutex_lock(&info->mutex);
-
-	if (!info->gd)
-		/* xbdev is closed */
-		err = -ERESTARTSYS;
-
-	mutex_unlock(&info->mutex);
-
-out:
-	mutex_unlock(&blkfront_mutex);
-	return err;
-}
-
-static void blkif_release(struct gendisk *disk, fmode_t mode)
-{
-	struct blkfront_info *info = disk->private_data;
-	struct xenbus_device *xbdev;
-
-	mutex_lock(&blkfront_mutex);
-	if (disk->part0->bd_openers)
-		goto out_mutex;
-
-	/*
-	 * Check if we have been instructed to close. We will have
-	 * deferred this request, because the bdev was still open.
-	 */
-
-	mutex_lock(&info->mutex);
-	xbdev = info->xbdev;
-
-	if (xbdev && xbdev->state == XenbusStateClosing) {
-		/* pending switch to state closed */
-		dev_info(disk_to_dev(disk), "releasing disk\n");
-		xlvbd_release_gendisk(info);
-		xenbus_frontend_closed(info->xbdev);
- 	}
-
-	mutex_unlock(&info->mutex);
-
-	if (!xbdev) {
-		/* sudden device removal */
-		dev_info(disk_to_dev(disk), "releasing disk\n");
-		xlvbd_release_gendisk(info);
-		disk->private_data = NULL;
-		free_info(info);
-	}
-
-out_mutex:
-	mutex_unlock(&blkfront_mutex);
-}
-
 static const struct block_device_operations xlvbd_block_fops =
 {
 	.owner = THIS_MODULE,
-	.open = blkif_open,
-	.release = blkif_release,
 	.getgeo = blkif_getgeo,
 	.ioctl = blkif_ioctl,
 	.compat_ioctl = blkdev_compat_ptr_ioctl,
-- 
2.30.2


Re: [BUG report] Deadlock in xen-blkfront upon device hot-unplug
Posted by Vitaly Kuznetsov 2 years, 8 months ago
Christoph Hellwig <hch@lst.de> writes:

> On Thu, Jul 15, 2021 at 03:17:37PM +0200, Vitaly Kuznetsov wrote:
>> Christoph Hellwig <hch@lst.de> writes:
>> 
>> > On Thu, Jul 15, 2021 at 11:16:30AM +0200, Vitaly Kuznetsov wrote:
>> >> I'm observing a deadlock every time I try to unplug a xen-blkfront
>> >> device. With 5.14-rc1+ the deadlock looks like:
>> >
>> > I did actually stumble over this a few days ago just from code
>> > inspection.  Below is what I come up with, can you give it a spin?
>> 
>> This eliminates the deadlock, thanks! Unfortunately, this reveals the
>> same issue I observed when I just dropped taking the mutex from
>> blkfront_closing():
>
> Yeah, this still left too much cruft in blkfront_closing.  Can you
> try this version instead?
>

This one seems to work fine for me! Feel free to throw

Tested-by: Vitaly Kuznetsov <vkuznets@redhat.com>

in. Thanks a lot!

-- 
Vitaly