drivers/irqchip/irq-riscv-aplic-main.c | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-)
On RISC-V the APLIC serves part of the GSI interrupts, but unlike
other arthitecture it's initialized a bit late on ACPI based
system:
- the spec only mandates the report in DSDT (riscv-brs rule AML_100)
so the APLIC is created as platform_device when scanning DSDT
- the driver is registered and initialize the device in device_initcall
stage
The creation of devices depends on APLIC is deferred after the APLIC
is initialized (when the driver calls acpi_dev_clear_dependencies),
not like most other devices which is created when scanning the DSDT.
The affected devices include those declare the dependency explicitly
by ACPI _DEP method and _PRT for PCIe host bridge and those require
their interrupts as GSI. Furhtermore, the deferred creation is
performed in an async way (queued in the system_dfl_wq workqueue)
but all contend on the acpi_scan_lock.
Since the deferred devcie creation is asynchronous and will contend
for the same lock, the order and timing is not certain. And the time
is late enough for the device creation running parallel with the init
task. This will lead to below issues (also observed on our platforms):
- the console/tty device is created lately and sometimes it's not ready
when init task check for its presence. the system will crash in the
latter case since the init task always requires a valid console.
- the root device will by probed and registered lately (e.g. NVME,
after the init task executed) and may run into the rescue shell if
root device is not found.
We'll run into the issues more often in linuxboot since the init tasks
is more simpler (usually u-root) and will check for the console/root
devices more earlier.
Solve this by promote the APLIC driver register stage to core_initcall
which is prior to the APLIC device creation. So the dependency for
the GSI is met earlier. The key system devices like tty/PCI will be
created earlier when scanning ACPI namespace in a synchronous manner
and won't be parallel with the init task. So it's certain to have
a console/root device when the init task running.
Signed-off-by: Yicong Yang <yang.yicong@picoheart.com>
---
drivers/irqchip/irq-riscv-aplic-main.c | 14 +++++++++++++-
1 file changed, 13 insertions(+), 1 deletion(-)
diff --git a/drivers/irqchip/irq-riscv-aplic-main.c b/drivers/irqchip/irq-riscv-aplic-main.c
index 93e7c51f944a..86a3d19b6b24 100644
--- a/drivers/irqchip/irq-riscv-aplic-main.c
+++ b/drivers/irqchip/irq-riscv-aplic-main.c
@@ -231,4 +231,16 @@ static struct platform_driver aplic_driver = {
},
.probe = aplic_probe,
};
-builtin_platform_driver(aplic_driver);
+
+static int __init aplic_driver_init(void)
+{
+ return platform_driver_register(&aplic_driver);
+}
+
+/*
+ * APLIC serves part of GSI interrupts and some key system devices like
+ * TTY/PCI depends on its initialization. Register the driver prior to
+ * APLIC device (on ACPI it's created in subsys_initcall when scanning
+ * the namespace devices) to make the GSI service ready early.
+ */
+core_initcall(aplic_driver_init);
--
2.34.1
On Wed, Jan 14, 2026 at 12:08 PM Yicong Yang <yang.yicong@picoheart.com> wrote:
>
> On RISC-V the APLIC serves part of the GSI interrupts, but unlike
> other arthitecture it's initialized a bit late on ACPI based
> system:
> - the spec only mandates the report in DSDT (riscv-brs rule AML_100)
> so the APLIC is created as platform_device when scanning DSDT
> - the driver is registered and initialize the device in device_initcall
> stage
>
> The creation of devices depends on APLIC is deferred after the APLIC
> is initialized (when the driver calls acpi_dev_clear_dependencies),
> not like most other devices which is created when scanning the DSDT.
> The affected devices include those declare the dependency explicitly
> by ACPI _DEP method and _PRT for PCIe host bridge and those require
> their interrupts as GSI. Furhtermore, the deferred creation is
> performed in an async way (queued in the system_dfl_wq workqueue)
> but all contend on the acpi_scan_lock.
>
> Since the deferred devcie creation is asynchronous and will contend
> for the same lock, the order and timing is not certain. And the time
> is late enough for the device creation running parallel with the init
> task. This will lead to below issues (also observed on our platforms):
> - the console/tty device is created lately and sometimes it's not ready
> when init task check for its presence. the system will crash in the
> latter case since the init task always requires a valid console.
> - the root device will by probed and registered lately (e.g. NVME,
> after the init task executed) and may run into the rescue shell if
> root device is not found.
>
> We'll run into the issues more often in linuxboot since the init tasks
> is more simpler (usually u-root) and will check for the console/root
> devices more earlier.
>
> Solve this by promote the APLIC driver register stage to core_initcall
> which is prior to the APLIC device creation. So the dependency for
> the GSI is met earlier. The key system devices like tty/PCI will be
> created earlier when scanning ACPI namespace in a synchronous manner
> and won't be parallel with the init task. So it's certain to have
> a console/root device when the init task running.
Changing the driver registration priority is not going to help. For DT,
we should rely on fw_devlink to ensure APLIC is probed before
drivers consuming APLIC interrupts. For ACPI in the RISC-V world,
the APLIC probe ordering using GSI mappings and _DEP objects.
There was a recent discussion on this so refer:
https://www.spinics.net/lists/kernel/msg5938816.html
Regards,
Anup
>
> Signed-off-by: Yicong Yang <yang.yicong@picoheart.com>
> ---
> drivers/irqchip/irq-riscv-aplic-main.c | 14 +++++++++++++-
> 1 file changed, 13 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/irqchip/irq-riscv-aplic-main.c b/drivers/irqchip/irq-riscv-aplic-main.c
> index 93e7c51f944a..86a3d19b6b24 100644
> --- a/drivers/irqchip/irq-riscv-aplic-main.c
> +++ b/drivers/irqchip/irq-riscv-aplic-main.c
> @@ -231,4 +231,16 @@ static struct platform_driver aplic_driver = {
> },
> .probe = aplic_probe,
> };
> -builtin_platform_driver(aplic_driver);
> +
> +static int __init aplic_driver_init(void)
> +{
> + return platform_driver_register(&aplic_driver);
> +}
> +
> +/*
> + * APLIC serves part of GSI interrupts and some key system devices like
> + * TTY/PCI depends on its initialization. Register the driver prior to
> + * APLIC device (on ACPI it's created in subsys_initcall when scanning
> + * the namespace devices) to make the GSI service ready early.
> + */
> +core_initcall(aplic_driver_init);
> --
> 2.34.1
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv
Hi Anup,
On 1/14/26 4:57 PM, Anup Patel wrote:
> On Wed, Jan 14, 2026 at 12:08 PM Yicong Yang <yang.yicong@picoheart.com> wrote:
>>
>> On RISC-V the APLIC serves part of the GSI interrupts, but unlike
>> other arthitecture it's initialized a bit late on ACPI based
>> system:
>> - the spec only mandates the report in DSDT (riscv-brs rule AML_100)
>> so the APLIC is created as platform_device when scanning DSDT
>> - the driver is registered and initialize the device in device_initcall
>> stage
>>
>> The creation of devices depends on APLIC is deferred after the APLIC
>> is initialized (when the driver calls acpi_dev_clear_dependencies),
>> not like most other devices which is created when scanning the DSDT.
>> The affected devices include those declare the dependency explicitly
>> by ACPI _DEP method and _PRT for PCIe host bridge and those require
>> their interrupts as GSI. Furhtermore, the deferred creation is
>> performed in an async way (queued in the system_dfl_wq workqueue)
>> but all contend on the acpi_scan_lock.
>>
>> Since the deferred devcie creation is asynchronous and will contend
>> for the same lock, the order and timing is not certain. And the time
>> is late enough for the device creation running parallel with the init
>> task. This will lead to below issues (also observed on our platforms):
>> - the console/tty device is created lately and sometimes it's not ready
>> when init task check for its presence. the system will crash in the
>> latter case since the init task always requires a valid console.
>> - the root device will by probed and registered lately (e.g. NVME,
>> after the init task executed) and may run into the rescue shell if
>> root device is not found.
>>
>> We'll run into the issues more often in linuxboot since the init tasks
>> is more simpler (usually u-root) and will check for the console/root
>> devices more earlier.
>>
>> Solve this by promote the APLIC driver register stage to core_initcall
>> which is prior to the APLIC device creation. So the dependency for
>> the GSI is met earlier. The key system devices like tty/PCI will be
>> created earlier when scanning ACPI namespace in a synchronous manner
>> and won't be parallel with the init task. So it's certain to have
>> a console/root device when the init task running.
>
> Changing the driver registration priority is not going to help. For DT,
> we should rely on fw_devlink to ensure APLIC is probed before
> drivers consuming APLIC interrupts. For ACPI in the RISC-V world,
> the APLIC probe ordering using GSI mappings and _DEP objects.
>
> There was a recent discussion on this so refer:
> https://www.spinics.net/lists/kernel/msg5938816.html
>
Thanks for the reference, the problem is different (though their problems
should also blame to the asynchronous device creation). Our problem is
the devices that depends on the APLIC is created lately, *parallel* with the init
task so sometimes they're not even created (e.g. PCIe host bridge, tty) when the
init task running to the stage to check for these devices (as described in
the commit).
As for ACPI, this patch isn't going to change the probe ordering, the
dependency's still described by the GSI mappings, _DEP or _PRT method
and honored. But to make the related devices created earlier and in a
synchronous manner. Currently the devices creation that depend on APLIC
(take PCIe host bridge as example) is like below:
[init thread] [workqueue N]
// subsys_initcall
acpi_init()
acpi_arch_init()
// create GSI mappings
riscv_acpi_init_gsi_mapping()
acpi_bus_scan()
[...]
acpi_walk_namespace(acpi_bus_check_add_1)
// devices depend on APLIC, add to
// acpi_dep_list for deferred creation
acpi_scan_check_dep()
acpi_scan_add_dep()
// create acpi_device for APLIC or
// other independent device
acpi_add_single_object()
acpi_bus_attach()
// APLIC or other independent device
acpi_create_platform_device()
acpi_scan_postponed()
// create acpi_device for APLIC depended
// devices, e.g. PCI host bridge
acpi_add_single_object()
[...]
// device_initcall
platform_driver_register(&aplic_driver)
driver_attach()
// probe and init APLIC
aplic_probe()
[...]
acpi_dev_clear_dependencies()
// create work for each device
queue_work() acpi_scan_clear_dep_fn
[...] acpi_scan_lock_acquire() // will compete with other
// device creation
// later initcall than enter init task. // e.g. for PCIe host brigdes
acpi_pci_root_add() // create PCIe Root which
// is *parallel* with init
acpi_scan_lock_release()
But if we register the driver earlier, the APLIC will be probed and
initialized early in the 1st time DSDT scan. since the dependency
is met, other devices depends on the APLIC will be created in the 2nd
time DSDT scan in a synchronous way:
[init thread]
// core_initcall
platform_driver_register(&aplic_driver)
// subsys_initcall
acpi_init()
acpi_arch_init()
// create GSI mappings
riscv_acpi_init_gsi_mapping()
acpi_bus_scan()
[...]
acpi_walk_namespace(acpi_bus_check_add_1)
// devices depend on APLIC, add to
// acpi_dep_list for deferred creation
acpi_scan_check_dep()
acpi_scan_add_dep()
// create acpi_device for APLIC or
// other independent device
acpi_add_single_object()
acpi_bus_attach()
// APLIC or other independent device
acpi_create_platform_device()
aplic_probe() // driver's registered, probe directly
[...]
acpi_dev_clear_dependencies()
acpi_scan_clear_dep()
// acpi_device for e.g. PCI is not created
// so won't queue the clear_dep work. only
// mark the dependency met here
if (acpi_device)
acpi_scan_clear_dep_queue()
acpi_scan_postponed()
acpi_walk_namespace(acpi_bus_check_add_2)
// create acpi_device for APLIC depended
// devices, e.g. PCI host bridge
acpi_add_single_object()
acpi_bus_attach()
acpi_pci_root_add() // e.g. PCIe host bridge created
// acpi_create_platform_device() for
// other platform devices
// later initcalls then enter init task
with above, since the initcalls and init task execution is serialized,
at least the basic devices like tty platform devices and PCIe host
bridges is created.
Thanks.
On Wed, Jan 14 2026 at 19:48, Yicong Yang wrote:
> On 1/14/26 4:57 PM, Anup Patel wrote:
>> On Wed, Jan 14, 2026 at 12:08 PM Yicong Yang <yang.yicong@picoheart.com> wrote:
>>>
>>> On RISC-V the APLIC serves part of the GSI interrupts, but unlike
>>> other arthitecture it's initialized a bit late on ACPI based
>>> system:
>>> - the spec only mandates the report in DSDT (riscv-brs rule AML_100)
>>> so the APLIC is created as platform_device when scanning DSDT
>>> - the driver is registered and initialize the device in device_initcall
>>> stage
>>>
>>> The creation of devices depends on APLIC is deferred after the APLIC
>>> is initialized (when the driver calls acpi_dev_clear_dependencies),
>>> not like most other devices which is created when scanning the DSDT.
>>> The affected devices include those declare the dependency explicitly
>>> by ACPI _DEP method and _PRT for PCIe host bridge and those require
>>> their interrupts as GSI. Furhtermore, the deferred creation is
>>> performed in an async way (queued in the system_dfl_wq workqueue)
>>> but all contend on the acpi_scan_lock.
The lock contention is irrelevant to the real underlying problem.
>>> Since the deferred devcie creation is asynchronous and will contend
>>> for the same lock, the order and timing is not certain. And the time
>>> is late enough for the device creation running parallel with the init
>>> task. This will lead to below issues (also observed on our platforms):
>>> - the console/tty device is created lately and sometimes it's not ready
>>> when init task check for its presence. the system will crash in the
>>> latter case since the init task always requires a valid console.
>>> - the root device will by probed and registered lately (e.g. NVME,
>>> after the init task executed) and may run into the rescue shell if
>>> root device is not found.
And again, you _cannot_ solve this problem completely with initcall
ordering;
Deferred probing with delegation to work queues has the systemic
issue that there is no guarantee that all devices, which are required
to actually proceed to userspace, have been initialized at that
point.
Changing the initcall priority of a particular driver papers over the
underlying problem to the extent that _you_ cannot observe it anymore,
but that provides exactly _zero_ guarantee that it is correct under all
circumstances. "Works for me" is the worst engineering principle as you
might know already.
That said, I still refuse to take random initcall ordering patches
unless somebody comes up with a coherent explanation of the actual
guarantee.
But before you start to come up with more fairy tales, let me come back
to your two points from above:
>>> - the console/tty device is created lately and sometimes it's not ready
>>> when init task check for its presence. the system will crash in the
>>> latter case since the init task always requires a valid console.
I assume you want to say that console_on_rootfs() fails to open
'/dev/console', right?
That's obvious because console_on_rootfs() is invoked _before_
async_synchronize_full() is invoked which ensures that all outstanding
initialization work has been completed.
The fix for this is obvious too and it's therefore bloody obvious that
changing the init call priority of a random driver does not fix that at
all, no?
But that's not sufficient, see below.
>>> - the root device will by probed and registered lately (e.g. NVME,
>>> after the init task executed) and may run into the rescue shell if
>>> root device is not found.
You completely fail to explain how outstanding initializations in work
queues survive past the async_synchronize_full() synchronization
point. You are merely describing random observations on your system, but
you stopped right there without trying to decode the underlying root
cause.
The root cause is:
1) as I already said above that deferred probing does not provide any
guarantees at all.
2) async_synchronize_full() is obviously not the barrier which it is
supposed to be (the misplaced console_on_rootfs() call aside).
That needs to be fixed at the conceptual level and not hacked around
with "works for me" patches and fairy tale change logs.
Thanks,
tglx
On 1/15/26 3:50 AM, Thomas Gleixner wrote: > On Wed, Jan 14 2026 at 19:48, Yicong Yang wrote: >> On 1/14/26 4:57 PM, Anup Patel wrote: >>> On Wed, Jan 14, 2026 at 12:08 PM Yicong Yang <yang.yicong@picoheart.com> wrote: >>>> >>>> On RISC-V the APLIC serves part of the GSI interrupts, but unlike >>>> other arthitecture it's initialized a bit late on ACPI based >>>> system: >>>> - the spec only mandates the report in DSDT (riscv-brs rule AML_100) >>>> so the APLIC is created as platform_device when scanning DSDT >>>> - the driver is registered and initialize the device in device_initcall >>>> stage >>>> >>>> The creation of devices depends on APLIC is deferred after the APLIC >>>> is initialized (when the driver calls acpi_dev_clear_dependencies), >>>> not like most other devices which is created when scanning the DSDT. >>>> The affected devices include those declare the dependency explicitly >>>> by ACPI _DEP method and _PRT for PCIe host bridge and those require >>>> their interrupts as GSI. Furhtermore, the deferred creation is >>>> performed in an async way (queued in the system_dfl_wq workqueue) >>>> but all contend on the acpi_scan_lock. > > The lock contention is irrelevant to the real underlying problem. > >>>> Since the deferred devcie creation is asynchronous and will contend >>>> for the same lock, the order and timing is not certain. And the time >>>> is late enough for the device creation running parallel with the init >>>> task. This will lead to below issues (also observed on our platforms): >>>> - the console/tty device is created lately and sometimes it's not ready >>>> when init task check for its presence. the system will crash in the >>>> latter case since the init task always requires a valid console. >>>> - the root device will by probed and registered lately (e.g. NVME, >>>> after the init task executed) and may run into the rescue shell if >>>> root device is not found. > > And again, you _cannot_ solve this problem completely with initcall > ordering; > > Deferred probing with delegation to work queues has the systemic > issue that there is no guarantee that all devices, which are required > to actually proceed to userspace, have been initialized at that > point. > > Changing the initcall priority of a particular driver papers over the > underlying problem to the extent that _you_ cannot observe it anymore, > but that provides exactly _zero_ guarantee that it is correct under all > circumstances. "Works for me" is the worst engineering principle as you > might know already. > > That said, I still refuse to take random initcall ordering patches > unless somebody comes up with a coherent explanation of the actual > guarantee. > ok, I see the points and it's reasonable to me. thanks.. > But before you start to come up with more fairy tales, let me come back > to your two points from above: > >>>> - the console/tty device is created lately and sometimes it's not ready >>>> when init task check for its presence. the system will crash in the >>>> latter case since the init task always requires a valid console. > > I assume you want to say that console_on_rootfs() fails to open > '/dev/console', right? > right. > That's obvious because console_on_rootfs() is invoked _before_ > async_synchronize_full() is invoked which ensures that all outstanding > initialization work has been completed. > it seems problematic here to put the console_on_rootfs() before async_synchronize_full() (as you point out), but my issue is not caused by it directly. but I think you're right that we should do the synchronization and make use of async_synchronize_full(). illustrate it below. > The fix for this is obvious too and it's therefore bloody obvious that > changing the init call priority of a random driver does not fix that at > all, no? > > But that's not sufficient, see below. > >>>> - the root device will by probed and registered lately (e.g. NVME, >>>> after the init task executed) and may run into the rescue shell if >>>> root device is not found. > > You completely fail to explain how outstanding initializations in work > queues survive past the async_synchronize_full() synchronization > point. You are merely describing random observations on your system, but > you stopped right there without trying to decode the underlying root > cause. > For devices depends on the APLIC, the platform_device (tty, PCIe root) creation will be deferred to stage where the APLIC driver called acpi_dev_clear_dependencies(). It'll iterate the dependency list and queue each device creation in the system_dfl_wq in acpi_scan_clear_dep_queue() [1], so the later driver probe will also be performed in the system_dfl_wq. async_synchronize_full() will synchronize all the works in async_wq but not other workqueues. that's the reason async_synchronize_full() failed to synchronize these devices creation/probe before the init process. Please correct me if there's any mistake. [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/acpi/scan.c?h=v6.19-rc5#n2400 > The root cause is: > > 1) as I already said above that deferred probing does not provide any > guarantees at all. > > 2) async_synchronize_full() is obviously not the barrier which it is > supposed to be (the misplaced console_on_rootfs() call aside). > > That needs to be fixed at the conceptual level and not hacked around > with "works for me" patches and fairy tale change logs. > so based on above, if we use async_wq (with async_schedule* APIs) in acpi_scan_clear_dep_queue() for creating these devices, the issue could be solved since we're sure to have these devices before entering userspace, since the barrier of async_synchronize_full(). This should be a solution with a conceptual support and I did a quick test on our platform it solves the issue. As for the order of console_on_rootfs()/async_synchronize_full(), though our issue is not directly caused by it, it will cause the same issue (by the console open time the async probing maybe not finised) theoretically and needs to be fixed, is it? Thanks.
On Thu, Jan 15 2026 at 16:31, Yicong Yang wrote:
> so based on above, if we use async_wq (with async_schedule* APIs) in
> acpi_scan_clear_dep_queue() for creating these devices, the issue
> could be solved since we're sure to have these devices before entering
> userspace, since the barrier of async_synchronize_full(). This should be
> a solution with a conceptual support and I did a quick test on our
> platform it solves the issue.
Sounds about right to me. The drivers core and ACPI folks might have
opinions though :)
> As for the order of console_on_rootfs()/async_synchronize_full(),
> though our issue is not directly caused by it, it will cause the
> same issue (by the console open time the async probing maybe not
> finised) theoretically and needs to be fixed, is it?
Yes, that should move past the synchronization point.
Thanks,
tglx
On 1/15/26 9:28 PM, Thomas Gleixner wrote: > On Thu, Jan 15 2026 at 16:31, Yicong Yang wrote: >> so based on above, if we use async_wq (with async_schedule* APIs) in >> acpi_scan_clear_dep_queue() for creating these devices, the issue >> could be solved since we're sure to have these devices before entering >> userspace, since the barrier of async_synchronize_full(). This should be >> a solution with a conceptual support and I did a quick test on our >> platform it solves the issue. > > Sounds about right to me. The drivers core and ACPI folks might have > opinions though :) > sure I'll wait a bit to see if there's further comment before sending out next version. >> As for the order of console_on_rootfs()/async_synchronize_full(), >> though our issue is not directly caused by it, it will cause the >> same issue (by the console open time the async probing maybe not >> finised) theoretically and needs to be fixed, is it? > > Yes, that should move past the synchronization point. > will include this fix as a separate patch since they're two separate issue. thanks a lot for the useful discussion to help find a better solution :) Thanks.
On Fri, Jan 16 2026 at 14:16, Yicong Yang wrote:
> On 1/15/26 9:28 PM, Thomas Gleixner wrote:
>> On Thu, Jan 15 2026 at 16:31, Yicong Yang wrote:
>>> so based on above, if we use async_wq (with async_schedule* APIs) in
>>> acpi_scan_clear_dep_queue() for creating these devices, the issue
>>> could be solved since we're sure to have these devices before entering
>>> userspace, since the barrier of async_synchronize_full(). This should be
>>> a solution with a conceptual support and I did a quick test on our
>>> platform it solves the issue.
>>
>> Sounds about right to me. The drivers core and ACPI folks might have
>> opinions though :)
>>
> sure I'll wait a bit to see if there's further comment before sending out
> next version.
Btw, there is a reason that this is on the default work queue. See
commit dc612486c919 ("ACPI: scan: Fix device object rescan in
acpi_scan_clear_dep()")
for details. So this needs some more thought.
Thanks,
tglx
© 2016 - 2026 Red Hat, Inc.