Fix the longstanding race between hv_pci_query_relations() and
survey_child_resources() by flushing the workqueue before we exit from
hv_pci_query_relations().
Fixes: 4daace0d8ce8 ("PCI: hv: Add paravirtual PCI front-end for Microsoft Hyper-V VMs")
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Cc: stable@vger.kernel.org
---
v2:
Removed the "debug code".
No change to the patch body.
Added Cc:stable
v3:
Added Michael's Reviewed-by.
drivers/pci/controller/pci-hyperv.c | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
index f33370b756283..b82c7cde19e66 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -3308,6 +3308,19 @@ static int hv_pci_query_relations(struct hv_device *hdev)
if (!ret)
ret = wait_for_response(hdev, &comp);
+ /*
+ * In the case of fast device addition/removal, it's possible that
+ * vmbus_sendpacket() or wait_for_response() returns -ENODEV but we
+ * already got a PCI_BUS_RELATIONS* message from the host and the
+ * channel callback already scheduled a work to hbus->wq, which can be
+ * running survey_child_resources() -> complete(&hbus->survey_event),
+ * even after hv_pci_query_relations() exits and the stack variable
+ * 'comp' is no longer valid. This can cause a strange hang issue
+ * or sometimes a page fault. Flush hbus->wq before we exit from
+ * hv_pci_query_relations() to avoid the issues.
+ */
+ flush_workqueue(hbus->wq);
+
return ret;
}
--
2.25.1
On Wed, Apr 19, 2023 at 07:40:32PM -0700, Dexuan Cui wrote:
> Fix the longstanding race between hv_pci_query_relations() and
> survey_child_resources() by flushing the workqueue before we exit from
> hv_pci_query_relations().
"Fix the longstanding race" is vague. Please describe the race
succinctly at least to give an idea of what the problem is.
> Fixes: 4daace0d8ce8 ("PCI: hv: Add paravirtual PCI front-end for Microsoft Hyper-V VMs")
> Signed-off-by: Dexuan Cui <decui@microsoft.com>
> Reviewed-by: Michael Kelley <mikelley@microsoft.com>
> Cc: stable@vger.kernel.org
> ---
>
> v2:
> Removed the "debug code".
> No change to the patch body.
> Added Cc:stable
>
> v3:
> Added Michael's Reviewed-by.
>
> drivers/pci/controller/pci-hyperv.c | 13 +++++++++++++
> 1 file changed, 13 insertions(+)
>
> diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
> index f33370b756283..b82c7cde19e66 100644
> --- a/drivers/pci/controller/pci-hyperv.c
> +++ b/drivers/pci/controller/pci-hyperv.c
> @@ -3308,6 +3308,19 @@ static int hv_pci_query_relations(struct hv_device *hdev)
> if (!ret)
> ret = wait_for_response(hdev, &comp);
>
> + /*
> + * In the case of fast device addition/removal, it's possible that
> + * vmbus_sendpacket() or wait_for_response() returns -ENODEV but we
> + * already got a PCI_BUS_RELATIONS* message from the host and the
> + * channel callback already scheduled a work to hbus->wq, which can be
> + * running survey_child_resources() -> complete(&hbus->survey_event),
> + * even after hv_pci_query_relations() exits and the stack variable
> + * 'comp' is no longer valid. This can cause a strange hang issue
"A strange hang" sounds like we don't understand what's happening, it
does not feel like it is a solid understanding of the issue.
I would remove it - given that you already explain that comp is no
longer valid - that is already a bug that needs fixing.
Acked-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
> + * or sometimes a page fault. Flush hbus->wq before we exit from
> + * hv_pci_query_relations() to avoid the issues.
> + */
> + flush_workqueue(hbus->wq);
> +
> return ret;
> }
>
> --
> 2.25.1
>
> From: Lorenzo Pieralisi <lpieralisi@kernel.org>
> Sent: Thursday, May 25, 2023 1:14 AM
> To: Dexuan Cui <decui@microsoft.com>
>
> On Wed, Apr 19, 2023 at 07:40:32PM -0700, Dexuan Cui wrote:
> > Fix the longstanding race between hv_pci_query_relations() and
> > survey_child_resources() by flushing the workqueue before we exit from
> > hv_pci_query_relations().
>
> "Fix the longstanding race" is vague. Please describe the race
> succinctly at least to give an idea of what the problem is.
Hi Lozenzo, I'm sorry for the late response -- finally I'm back on the patchset...
I'll use the below commit message in v4:
Since day 1 of the driver, there has been a race between
hv_pci_query_relations() and survey_child_resources(): during fast
device hotplug, hv_pci_query_relations() may error out due to
device-remove and the stack variable 'comp' is no longer valid;
however, pci_devices_present_work() -> survey_child_resources() ->
complete() may be running on another CPU and accessing the no-longer-valid
'comp'. Fix the race by flushing the workqueue before we exit from
hv_pci_query_relations().
I'll also update the comment to share more details of the race:
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -3401,6 +3401,24 @@ static int hv_pci_query_relations(struct hv_device *hdev)
if (!ret)
ret = wait_for_response(hdev, &comp);
+ /*
+ * In the case of fast device addition/removal, it's possible that
+ * vmbus_sendpacket() or wait_for_response() returns -ENODEV but we
+ * already got a PCI_BUS_RELATIONS* message from the host and the
+ * channel callback already scheduled a work to hbus->wq, which can be
+ * running pci_devices_present_work() -> survey_child_resources() ->
+ * complete(&hbus->survey_event), even after hv_pci_query_relations()
+ * exits and the stack variable 'comp' is no longer valid; as a result,
+ * a hang or a page fault may happen when the complete() calls
+ * raw_spin_lock_irqsave(). Flush hbus->wq before we exit from
+ * hv_pci_query_relations() to avoid the issues. Note: if 'ret' is
+ * -ENODEV, there can't be any more work item scheduled to hbus->wq
+ * after the flush_workqueue(): see vmbus_onoffer_rescind() ->
+ * vmbus_reset_channel_cb(), vmbus_rescind_cleanup() ->
+ * channel->rescind = true.
+ */
+ flush_workqueue(hbus->wq);
+
return ret;
}
> > + * In the case of fast device addition/removal, it's possible that
> > + * vmbus_sendpacket() or wait_for_response() returns -ENODEV but we
> > + * already got a PCI_BUS_RELATIONS* message from the host and the
> > + * channel callback already scheduled a work to hbus->wq, which can
> > be
> > + * running survey_child_resources() -> complete(&hbus->survey_event),
> > + * even after hv_pci_query_relations() exits and the stack variable
> > + * 'comp' is no longer valid. This can cause a strange hang issue
>
> "A strange hang" sounds like we don't understand what's happening, it
> does not feel like it is a solid understanding of the issue.
>
> I would remove it - given that you already explain that comp is no
> longer valid - that is already a bug that needs fixing.
I share more details in the comment (see the above).
> Acked-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Thanks!
© 2016 - 2025 Red Hat, Inc.