From nobody Thu May 16 16:43:20 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) client-ip=192.237.175.120; envelope-from=xen-devel-bounces@lists.xenproject.org; helo=lists.xenproject.org; Authentication-Results: mx.zohomail.com; dkim=pass header.i=@amazon.de; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass(p=quarantine dis=none) header.from=amazon.de ARC-Seal: i=1; a=rsa-sha256; t=1706113950; cv=none; d=zohomail.com; s=zohoarc; b=KhjnAL7pk+B9TYj/puHKpiYDo2rUlpSYuzIQXj0j4is6am8S0JB4kCQYM/FlQXVdck0rJN4KLTwD0XNf7cnWbKwO3aiZIUuU9hwF3OxFCdEgugOinzRFHvgI8BA1o+mE5gCNNPl8DbDecSSqFHuqvaj+URj9dlwdVYN0zDi0JY4= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1706113950; h=Content-Type:Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:List-Subscribe:List-Post:List-Id:List-Help:List-Unsubscribe:MIME-Version:Message-ID:Sender:Subject:Subject:Message-Id:Reply-To:To; bh=7vT1PswudYg3wT0G4RuvjsUk701LiPWja9I6tIWZQRY=; b=a/5165Vib6J9XTHCigaRrDl77g8mEQTk1GieeT5qWESBsltMJD4Td3uyevLatlKqjANF/ngzxMyz/Rpw5OgNDanrJdvi7njTxVYBGwRktXLLgonaq7Csu0y2f9k3+i7J/SKUl604JVNV2i2+UJAGjMPavMtjES1tgR1EiMpN6ds= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass header.i=@amazon.de; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass header.from= (p=quarantine dis=none) Return-Path: Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) by mx.zohomail.com with SMTPS id 1706113950622572.8742881721984; Wed, 24 Jan 2024 08:32:30 -0800 (PST) Received: from list by lists.xenproject.org with outflank-mailman.671154.1044349 (Exim 4.92) (envelope-from ) id 1rSgAa-0003HN-Dd; Wed, 24 Jan 2024 16:32:08 +0000 Received: by outflank-mailman (output) from mailman id 671154.1044349; Wed, 24 Jan 2024 16:32:08 +0000 Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1rSgAa-0003HG-AC; Wed, 24 Jan 2024 16:32:08 +0000 Received: by outflank-mailman (input) for mailman id 671154; Wed, 24 Jan 2024 16:32:07 +0000 Received: from se1-gles-sth1-in.inumbo.com ([159.253.27.254] helo=se1-gles-sth1.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1rSgAZ-0003HA-2N for xen-devel@lists.xenproject.org; Wed, 24 Jan 2024 16:32:07 +0000 Received: from smtp-fw-52005.amazon.com (smtp-fw-52005.amazon.com [52.119.213.156]) by se1-gles-sth1.inumbo.com (Halon) with ESMTPS id 1b759d33-bad6-11ee-98f5-6d05b1d4d9a1; Wed, 24 Jan 2024 17:32:05 +0100 (CET) Received: from iad12-co-svc-p1-lb1-vlan3.amazon.com (HELO email-inbound-relay-iad-1e-m6i4x-3554bfcf.us-east-1.amazon.com) ([10.43.8.6]) by smtp-border-fw-52005.iad7.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Jan 2024 16:32:02 +0000 Received: from smtpout.prod.us-east-1.prod.farcaster.email.amazon.dev (iad7-ws-svc-p70-lb3-vlan2.iad.amazon.com [10.32.235.34]) by email-inbound-relay-iad-1e-m6i4x-3554bfcf.us-east-1.amazon.com (Postfix) with ESMTPS id 641D7806A9; Wed, 24 Jan 2024 16:32:00 +0000 (UTC) Received: from EX19MTAUEA001.ant.amazon.com [10.0.44.209:41371] by smtpin.naws.us-east-1.prod.farcaster.email.amazon.dev [10.0.88.120:2525] with esmtp (Farcaster) id 439588db-d258-4371-8e65-95409cade95b; Wed, 24 Jan 2024 16:31:59 +0000 (UTC) Received: from EX19D008UEA003.ant.amazon.com (10.252.134.116) by EX19MTAUEA001.ant.amazon.com (10.252.134.203) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Wed, 24 Jan 2024 16:31:49 +0000 Received: from EX19MTAUWC001.ant.amazon.com (10.250.64.145) by EX19D008UEA003.ant.amazon.com (10.252.134.116) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Wed, 24 Jan 2024 16:31:48 +0000 Received: from dev-dsk-mheyne-1b-c1362c4d.eu-west-1.amazon.com (10.15.57.183) by mail-relay.amazon.com (10.250.64.145) with Microsoft SMTP Server id 15.2.1118.40 via Frontend Transport; Wed, 24 Jan 2024 16:31:48 +0000 Received: by dev-dsk-mheyne-1b-c1362c4d.eu-west-1.amazon.com (Postfix, from userid 5466572) id 37CB1ADF; Wed, 24 Jan 2024 16:31:48 +0000 (UTC) X-Outflank-Mailman: Message body and most headers restored to incoming version X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" X-Inumbo-ID: 1b759d33-bad6-11ee-98f5-6d05b1d4d9a1 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.de; i=@amazon.de; q=dns/txt; s=amazon201209; t=1706113925; x=1737649925; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=7vT1PswudYg3wT0G4RuvjsUk701LiPWja9I6tIWZQRY=; b=CNkDA6LvxIBfTlyEwqlfSXPNmBGHBovLb1WLNGLwie/sQSLlJvouxmNn OeI63NIwC2F8p10Heas1BZwPr1dxHQXwCkaWiZlihBQLho8sdmiO+93zI 9ISo0xMyeLsewiCAvaCrsexNqeEgZjP6gflG3CF0piGE+1aQC1TEFpXAu A=; X-IronPort-AV: E=Sophos;i="6.05,216,1701129600"; d="scan'208";a="629728689" X-Farcaster-Flow-ID: 439588db-d258-4371-8e65-95409cade95b From: Maximilian Heyne To: CC: Julien Grall , Maximilian Heyne , , Andrew Panyakin , "Juergen Gross" , Stefano Stabellini , Oleksandr Tyshchenko , Rahul Singh , David Woodhouse , Viresh Kumar , Konrad Rzeszutek Wilk , Jeremy Fitzhardinge , , Subject: [PATCH] xen/events: close evtchn after mapping cleanup Date: Wed, 24 Jan 2024 16:31:28 +0000 Message-ID: <20240124163130.31324-1-mheyne@amazon.de> X-Mailer: git-send-email 2.40.1 MIME-Version: 1.0 Precedence: Bulk Content-Transfer-Encoding: quoted-printable X-ZohoMail-DKIM: pass (identity @amazon.de) X-ZM-MESSAGEID: 1706113952114100001 Content-Type: text/plain; charset="utf-8" shutdown_pirq and startup_pirq are not taking the irq_mapping_update_lock because they can't due to lock inversion. Both are called with the irq_desc->lock being taking. The lock order, however, is first irq_mapping_update_lock and then irq_desc->lock. This opens multiple races: - shutdown_pirq can be interrupted by a function that allocates an event channel: CPU0 CPU1 shutdown_pirq { xen_evtchn_close(e) __startup_pirq { EVTCHNOP_bind_pirq -> returns just freed evtchn e set_evtchn_to_irq(e, irq) } xen_irq_info_cleanup() { set_evtchn_to_irq(e, -1) } } Assume here event channel e refers here to the same event channel number. After this race the evtchn_to_irq mapping for e is invalid (-1). - __startup_pirq races with __unbind_from_irq in a similar way. Because __startup_pirq doesn't take irq_mapping_update_lock it can grab the evtchn that __unbind_from_irq is currently freeing and cleaning up. In this case even though the event channel is allocated, its mapping can be unset in evtchn_to_irq. The fix is to first cleanup the mappings and then close the event channel. In this way, when an event channel gets allocated it's potential previous evtchn_to_irq mappings are guaranteed to be unset alread= y. This is also the reverse order of the allocation where first the event channel is allocated and then the mappings are setup. On a 5.10 kernel prior to commit 3fcdaf3d7634 ("xen/events: modify internal [un]bind interfaces"), we hit a BUG like the following during probing of NV= Me devices. The issue is that during nvme_setup_io_queues, pci_free_irq is called for every device which results in a call to shutdown_pirq. With many nvme devices it's therefore likely to hit this race during boot because there will be multiple calls to shutdown_pirq and startup_pirq are running potentially in parallel. ------------[ cut here ]------------ blkfront: xvda: barrier or flush: disabled; persistent grants: enabled; i= ndirect descriptors: enabled; bounce buffer: enabled kernel BUG at drivers/xen/events/events_base.c:499! invalid opcode: 0000 [#1] SMP PTI CPU: 44 PID: 375 Comm: kworker/u257:23 Not tainted 5.10.201-191.748.amzn2= .x86_64 #1 Hardware name: Xen HVM domU, BIOS 4.11.amazon 08/24/2006 Workqueue: nvme-reset-wq nvme_reset_work RIP: 0010:bind_evtchn_to_cpu+0xdf/0xf0 Code: 5d 41 5e c3 cc cc cc cc 44 89 f7 e8 2b 55 ad ff 49 89 c5 48 85 c0 0= f 84 64 ff ff ff 4c 8b 68 30 41 83 fe ff 0f 85 60 ff ff ff <0f> 0b 66 66 2e= 0f 1f 84 00 00 00 00 00 0f 1f 40 00 0f 1f 44 00 00 RSP: 0000:ffffc9000d533b08 EFLAGS: 00010046 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000006 RDX: 0000000000000028 RSI: 00000000ffffffff RDI: 00000000ffffffff RBP: ffff888107419680 R08: 0000000000000000 R09: ffffffff82d72b00 R10: 0000000000000000 R11: 0000000000000000 R12: 00000000000001ed R13: 0000000000000000 R14: 00000000ffffffff R15: 0000000000000002 FS: 0000000000000000(0000) GS:ffff88bc8b500000(0000) knlGS:0000000000000= 000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 0000000002610001 CR4: 00000000001706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: ? show_trace_log_lvl+0x1c1/0x2d9 ? show_trace_log_lvl+0x1c1/0x2d9 ? set_affinity_irq+0xdc/0x1c0 ? __die_body.cold+0x8/0xd ? die+0x2b/0x50 ? do_trap+0x90/0x110 ? bind_evtchn_to_cpu+0xdf/0xf0 ? do_error_trap+0x65/0x80 ? bind_evtchn_to_cpu+0xdf/0xf0 ? exc_invalid_op+0x4e/0x70 ? bind_evtchn_to_cpu+0xdf/0xf0 ? asm_exc_invalid_op+0x12/0x20 ? bind_evtchn_to_cpu+0xdf/0xf0 ? bind_evtchn_to_cpu+0xc5/0xf0 set_affinity_irq+0xdc/0x1c0 irq_do_set_affinity+0x1d7/0x1f0 irq_setup_affinity+0xd6/0x1a0 irq_startup+0x8a/0xf0 __setup_irq+0x639/0x6d0 ? nvme_suspend+0x150/0x150 request_threaded_irq+0x10c/0x180 ? nvme_suspend+0x150/0x150 pci_request_irq+0xa8/0xf0 ? __blk_mq_free_request+0x74/0xa0 queue_request_irq+0x6f/0x80 nvme_create_queue+0x1af/0x200 nvme_create_io_queues+0xbd/0xf0 nvme_setup_io_queues+0x246/0x320 ? nvme_irq_check+0x30/0x30 nvme_reset_work+0x1c8/0x400 process_one_work+0x1b0/0x350 worker_thread+0x49/0x310 ? process_one_work+0x350/0x350 kthread+0x11b/0x140 ? __kthread_bind_mask+0x60/0x60 ret_from_fork+0x22/0x30 Modules linked in: ---[ end trace a11715de1eee1873 ]--- Fixes: d46a78b05c0e ("xen: implement pirq type event channels") Cc: stable@vger.kernel.org Co-debugged-by: Andrew Panyakin Signed-off-by: Maximilian Heyne Reviewed-by: Juergen Gross --- drivers/xen/events/events_base.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/xen/events/events_base.c b/drivers/xen/events/events_b= ase.c index b8cfea7812d6..3b9f080109d7 100644 --- a/drivers/xen/events/events_base.c +++ b/drivers/xen/events/events_base.c @@ -923,8 +923,8 @@ static void shutdown_pirq(struct irq_data *data) return; =20 do_mask(info, EVT_MASK_REASON_EXPLICIT); - xen_evtchn_close(evtchn); xen_irq_info_cleanup(info); + xen_evtchn_close(evtchn); } =20 static void enable_pirq(struct irq_data *data) @@ -956,6 +956,7 @@ EXPORT_SYMBOL_GPL(xen_irq_from_gsi); static void __unbind_from_irq(struct irq_info *info, unsigned int irq) { evtchn_port_t evtchn; + bool close_evtchn =3D false; =20 if (!info) { xen_irq_free_desc(irq); @@ -975,7 +976,7 @@ static void __unbind_from_irq(struct irq_info *info, un= signed int irq) struct xenbus_device *dev; =20 if (!info->is_static) - xen_evtchn_close(evtchn); + close_evtchn =3D true; =20 switch (info->type) { case IRQT_VIRQ: @@ -995,6 +996,9 @@ static void __unbind_from_irq(struct irq_info *info, un= signed int irq) } =20 xen_irq_info_cleanup(info); + + if (close_evtchn) + xen_evtchn_close(evtchn); } =20 xen_free_irq(info); --=20 2.40.1 Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B Sitz: Berlin Ust-ID: DE 289 237 879