[PATCH] resource: Improve child resource handling in release_mem_region_adjustable()

Sumanth Korikkar posted 1 patch 3 weeks ago
There is a newer version of this series
kernel/resource.c | 44 +++++++++++++++++++++++++++++++++++++++-----
1 file changed, 39 insertions(+), 5 deletions(-)
[PATCH] resource: Improve child resource handling in release_mem_region_adjustable()
Posted by Sumanth Korikkar 3 weeks ago
When memory block is removed via try_remove_memory(), it eventually
reaches release_mem_region_adjustable(). The current implementation
assumes that when a busy memory resource is split into two, all child
resources remain in the lower address range.

This simplification causes problems when child resources actually belong
to the upper split. For example:

* Initial memory layout:
lsmem
RANGE                                 SIZE   STATE REMOVABLE  BLOCK
0x0000000000000000-0x00000002ffffffff  12G  online       yes   0-95

* /proc/iomem
00000000-2dfefffff : System RAM
  158834000-1597b3fff : Kernel code
  1597b4000-159f50fff : Kernel data
  15a13c000-15a218fff : Kernel bss
2dff00000-2ffefffff : Crash kernel
2fff00000-2ffffffff : System RAM

* After offlining and removing range
  0x150000000-0x157ffffff
lsmem
RANGE                                  SIZE   STATE REMOVABLE  BLOCK
0x0000000000000000-0x000000014fffffff  5.3G  online       yes   0-41
0x0000000150000000-0x0000000157ffffff  128M offline               42
0x0000000158000000-0x00000002ffffffff  6.6G  online       yes  43-95

The iomem resource gets split into two entries, but kernel code, kernel
data, and kernel bss remain attached to the lower resource [0–5376M]
instead of the correct upper resource [5504M–12288M].

As a result, WARN_ON() triggers in release_mem_region_adjustable()
("Usecase: split into two entries - we need a new resource")
------------[ cut here ]------------
WARNING: CPU: 5 PID: 858 at kernel/resource.c:1486
release_mem_region_adjustable+0x210/0x280
Modules linked in:
CPU: 5 UID: 0 PID: 858 Comm: chmem Not tainted 6.17.0-rc2-11707-g2c36aaf3ba4e
Hardware name: IBM 3906 M04 704 (z/VM 7.3.0)
Krnl PSW : 0704d00180000000 0000024ec0dae0e4
           (release_mem_region_adjustable+0x214/0x280)
           R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3
Krnl GPRS: 0000000000000000 00000002ffffafc0 fffffffffffffff0 0000000000000000
           000000014fffffff 0000024ec2257608 0000000000000000 0000024ec2301758
           0000024ec22680d0 00000000902c9140 0000000150000000 00000002ffffafc0
           000003ffa61d8d18 0000024ec21fb478 0000024ec0dae014 000001cec194fbb0
Krnl Code: 0000024ec0dae0d8: af000000            mc      0,0
           0000024ec0dae0dc: a7f4ffc1            brc     15,0000024ec0dae05e
          #0000024ec0dae0e0: af000000            mc      0,0
          >0000024ec0dae0e4: a5defffd            llilh   %r13,65533
           0000024ec0dae0e8: c04000c6064c        larl    %r4,0000024ec266ed80
           0000024ec0dae0ee: eb1d400000f8        laa     %r1,%r13,0(%r4)
           0000024ec0dae0f4: 07e0                bcr     14,%r0
           0000024ec0dae0f6: a7f4ffc0            brc     15,0000024ec0dae076
Call Trace:
 [<0000024ec0dae0e4>] release_mem_region_adjustable+0x214/0x280
([<0000024ec0dadf3c>] release_mem_region_adjustable+0x6c/0x280)
 [<0000024ec10a2130>] try_remove_memory+0x100/0x140
 [<0000024ec10a4052>] __remove_memory+0x22/0x40
 [<0000024ec18890f6>] config_mblock_store+0x326/0x3e0
 [<0000024ec11f7056>] kernfs_fop_write_iter+0x136/0x210
 [<0000024ec1121e86>] vfs_write+0x236/0x3c0
 [<0000024ec11221b8>] ksys_write+0x78/0x110
 [<0000024ec1b6bfbe>] __do_syscall+0x12e/0x350
 [<0000024ec1b782ce>] system_call+0x6e/0x90
Last Breaking-Event-Address:
 [<0000024ec0dae014>] release_mem_region_adjustable+0x144/0x280
---[ end trace 0000000000000000 ]---

Also, resource adjustment doesn't happen and stale resources still cover
[0-12288M].  Later, memory re-add fails in register_memory_resource()
with -EBUSY.

i.e: /proc/iomem is still:
00000000-2dfefffff : System RAM
  158834000-1597b3fff : Kernel code
  1597b4000-159f50fff : Kernel data
  15a13c000-15a218fff : Kernel bss
2dff00000-2ffefffff : Crash kernel
2fff00000-2ffffffff : System RAM

Enhance release_mem_region_adjustable() to reassign child resources
to the correct parent after a split. Children are now assigned based on
their actual range: If they fall within the lower split, keep them in
the lower parent. If they fall within the upper split, move them to the
upper parent.

Kernel code/data/bss regions are not offlined, so they will always
reside entirely within one parent and never span across both.

Output after the enhancement:
* Initial state /proc/iomem (before removal of memory block):
00000000-2dfefffff : System RAM
  1f94f8000-1fa477fff : Kernel code
  1fa478000-1fac14fff : Kernel data
  1fae00000-1faedcfff : Kernel bss
2dff00000-2ffefffff : Crash kernel
2fff00000-2ffffffff : System RAM

* Offline and remove 0x1e8000000-0x1efffffff memory range
* /proc/iomem
00000000-1e7ffffff : System RAM
1f0000000-2dfefffff : System RAM
  1f94f8000-1fa477fff : Kernel code
  1fa478000-1fac14fff : Kernel data
  1fae00000-1faedcfff : Kernel bss
2dff00000-2ffefffff : Crash kernel
2fff00000-2ffffffff : System RAM

Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
---
 kernel/resource.c | 44 +++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 39 insertions(+), 5 deletions(-)

diff --git a/kernel/resource.c b/kernel/resource.c
index f9bb5481501a..c329b8a4aa2f 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -1388,6 +1388,41 @@ void __release_region(struct resource *parent, resource_size_t start,
 EXPORT_SYMBOL(__release_region);
 
 #ifdef CONFIG_MEMORY_HOTREMOVE
+static void append_child_to_parent(struct resource *new_parent, struct resource *new_child)
+{
+	struct resource *child;
+
+	child = new_parent->child;
+	if (child) {
+		while (child->sibling)
+			child = child->sibling;
+		child->sibling = new_child;
+	} else {
+		new_parent->child = new_child;
+	}
+	new_child->parent = new_parent;
+	new_child->sibling = NULL;
+}
+
+static void move_children_to_parent(struct resource *old_parent,
+				    struct resource *new_parent,
+				    resource_size_t split_addr)
+{
+	struct resource *child, *next, **p;
+
+	p = &old_parent->child;
+	while ((child = *p)) {
+		next = child->sibling;
+		if (child->start > split_addr) {
+			/* unlink child */
+			*p = next;
+			append_child_to_parent(new_parent, child);
+		} else {
+			p = &child->sibling;
+		}
+	}
+}
+
 /**
  * release_mem_region_adjustable - release a previously reserved memory region
  * @start: resource start address
@@ -1397,15 +1432,13 @@ EXPORT_SYMBOL(__release_region);
  * is released from a currently busy memory resource.  The requested region
  * must either match exactly or fit into a single busy resource entry.  In
  * the latter case, the remaining resource is adjusted accordingly.
- * Existing children of the busy memory resource must be immutable in the
- * request.
  *
  * Note:
  * - Additional release conditions, such as overlapping region, can be
  *   supported after they are confirmed as valid cases.
- * - When a busy memory resource gets split into two entries, the code
- *   assumes that all children remain in the lower address entry for
- *   simplicity.  Enhance this logic when necessary.
+ * - When a busy memory resource gets split into two entries, its children is
+ *   reassigned to the correct parent based on their range. If a child memory
+ *   resource overlaps with more than one parent, enhance the logic as needed.
  */
 void release_mem_region_adjustable(resource_size_t start, resource_size_t size)
 {
@@ -1482,6 +1515,7 @@ void release_mem_region_adjustable(resource_size_t start, resource_size_t size)
 			new_res->parent = res->parent;
 			new_res->sibling = res->sibling;
 			new_res->child = NULL;
+			move_children_to_parent(res, new_res, end);
 
 			if (WARN_ON_ONCE(__adjust_resource(res, res->start,
 							   start - res->start)))
-- 
2.48.1

Re: [PATCH] resource: Improve child resource handling in release_mem_region_adjustable()
Posted by David Hildenbrand 3 weeks ago
On 11.09.25 16:00, Sumanth Korikkar wrote:

Hi,

> When memory block is removed via try_remove_memory(), it eventually
> reaches release_mem_region_adjustable(). The current implementation
> assumes that when a busy memory resource is split into two, all child
> resources remain in the lower address range.
> 
> This simplification causes problems when child resources actually belong
> to the upper split. For example:
> 
> * Initial memory layout:
> lsmem
> RANGE                                 SIZE   STATE REMOVABLE  BLOCK
> 0x0000000000000000-0x00000002ffffffff  12G  online       yes   0-95
> 
> * /proc/iomem
> 00000000-2dfefffff : System RAM
>    158834000-1597b3fff : Kernel code
>    1597b4000-159f50fff : Kernel data
>    15a13c000-15a218fff : Kernel bss
> 2dff00000-2ffefffff : Crash kernel
> 2fff00000-2ffffffff : System RAM
> 
> * After offlining and removing range
>    0x150000000-0x157ffffff
> lsmem
> RANGE                                  SIZE   STATE REMOVABLE  BLOCK
> 0x0000000000000000-0x000000014fffffff  5.3G  online       yes   0-41
> 0x0000000150000000-0x0000000157ffffff  128M offline               42
> 0x0000000158000000-0x00000002ffffffff  6.6G  online       yes  43-95
> 

First of all

1) How are you triggering this :)

2) Why do you say "and removing the range" when it's still listed in 
lsmem output.

lsmem will only list present memory block devices. So if it's still 
listed there, nothing was removed. Or is that prior to actually removing it.


Is this some powerpc dlpar use case?

> The iomem resource gets split into two entries, but kernel code, kernel
> data, and kernel bss remain attached to the lower resource [0–5376M]
> instead of the correct upper resource [5504M–12288M].

Yes.

> 
> As a result, WARN_ON() triggers in release_mem_region_adjustable()
> ("Usecase: split into two entries - we need a new resource")
> ------------[ cut here ]------------
> WARNING: CPU: 5 PID: 858 at kernel/resource.c:1486
> release_mem_region_adjustable+0x210/0x280
> Modules linked in:
> CPU: 5 UID: 0 PID: 858 Comm: chmem Not tainted 6.17.0-rc2-11707-g2c36aaf3ba4e
> Hardware name: IBM 3906 M04 704 (z/VM 7.3.0)
> Krnl PSW : 0704d00180000000 0000024ec0dae0e4
>             (release_mem_region_adjustable+0x214/0x280)
>             R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3
> Krnl GPRS: 0000000000000000 00000002ffffafc0 fffffffffffffff0 0000000000000000
>             000000014fffffff 0000024ec2257608 0000000000000000 0000024ec2301758
>             0000024ec22680d0 00000000902c9140 0000000150000000 00000002ffffafc0
>             000003ffa61d8d18 0000024ec21fb478 0000024ec0dae014 000001cec194fbb0
> Krnl Code: 0000024ec0dae0d8: af000000            mc      0,0
>             0000024ec0dae0dc: a7f4ffc1            brc     15,0000024ec0dae05e
>            #0000024ec0dae0e0: af000000            mc      0,0
>            >0000024ec0dae0e4: a5defffd            llilh   %r13,65533
>             0000024ec0dae0e8: c04000c6064c        larl    %r4,0000024ec266ed80
>             0000024ec0dae0ee: eb1d400000f8        laa     %r1,%r13,0(%r4)
>             0000024ec0dae0f4: 07e0                bcr     14,%r0
>             0000024ec0dae0f6: a7f4ffc0            brc     15,0000024ec0dae076
> Call Trace:
>   [<0000024ec0dae0e4>] release_mem_region_adjustable+0x214/0x280
> ([<0000024ec0dadf3c>] release_mem_region_adjustable+0x6c/0x280)
>   [<0000024ec10a2130>] try_remove_memory+0x100/0x140
>   [<0000024ec10a4052>] __remove_memory+0x22/0x40
>   [<0000024ec18890f6>] config_mblock_store+0x326/0x3e0
>   [<0000024ec11f7056>] kernfs_fop_write_iter+0x136/0x210
>   [<0000024ec1121e86>] vfs_write+0x236/0x3c0
>   [<0000024ec11221b8>] ksys_write+0x78/0x110
>   [<0000024ec1b6bfbe>] __do_syscall+0x12e/0x350
>   [<0000024ec1b782ce>] system_call+0x6e/0x90
> Last Breaking-Event-Address:
>   [<0000024ec0dae014>] release_mem_region_adjustable+0x144/0x280
> ---[ end trace 0000000000000000 ]---
> 
> Also, resource adjustment doesn't happen and stale resources still cover
> [0-12288M].  Later, memory re-add fails in register_memory_resource()
> with -EBUSY.
> 
> i.e: /proc/iomem is still:
> 00000000-2dfefffff : System RAM
>    158834000-1597b3fff : Kernel code
>    1597b4000-159f50fff : Kernel data
>    15a13c000-15a218fff : Kernel bss
> 2dff00000-2ffefffff : Crash kernel
> 2fff00000-2ffffffff : System RAM
> 
> Enhance release_mem_region_adjustable() to reassign child resources
> to the correct parent after a split. Children are now assigned based on
> their actual range: If they fall within the lower split, keep them in
> the lower parent. If they fall within the upper split, move them to the
> upper parent.
> 
> Kernel code/data/bss regions are not offlined, so they will always
> reside entirely within one parent and never span across both.

Yes.

> 
> Output after the enhancement:
> * Initial state /proc/iomem (before removal of memory block):
> 00000000-2dfefffff : System RAM
>    1f94f8000-1fa477fff : Kernel code
>    1fa478000-1fac14fff : Kernel data
>    1fae00000-1faedcfff : Kernel bss
> 2dff00000-2ffefffff : Crash kernel
> 2fff00000-2ffffffff : System RAM
> 
> * Offline and remove 0x1e8000000-0x1efffffff memory range
> * /proc/iomem
> 00000000-1e7ffffff : System RAM
> 1f0000000-2dfefffff : System RAM
>    1f94f8000-1fa477fff : Kernel code
>    1fa478000-1fac14fff : Kernel data
>    1fae00000-1faedcfff : Kernel bss
> 2dff00000-2ffefffff : Crash kernel
> 2fff00000-2ffffffff : System RAM
> 

Do we need a Fixes: and CC stable?

> Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
> ---
>   kernel/resource.c | 44 +++++++++++++++++++++++++++++++++++++++-----
>   1 file changed, 39 insertions(+), 5 deletions(-)
> 
> diff --git a/kernel/resource.c b/kernel/resource.c
> index f9bb5481501a..c329b8a4aa2f 100644
> --- a/kernel/resource.c
> +++ b/kernel/resource.c
> @@ -1388,6 +1388,41 @@ void __release_region(struct resource *parent, resource_size_t start,
>   EXPORT_SYMBOL(__release_region);
>   
>   #ifdef CONFIG_MEMORY_HOTREMOVE
> +static void append_child_to_parent(struct resource *new_parent, struct resource *new_child)
> +{
> +	struct resource *child;
> +
> +	child = new_parent->child;
> +	if (child) {
> +		while (child->sibling)
> +			child = child->sibling;
> +		child->sibling = new_child;

Shouldn't we take care of the address ordering here? I guess this works 
because we process them in left-to-right (lowest-to-highest) address.

> +	} else {
> +		new_parent->child = new_child;
> +	}
> +	new_child->parent = new_parent;
> +	new_child->sibling = NULL;
> +}
> +
> +static void move_children_to_parent(struct resource *old_parent,
> +				    struct resource *new_parent,
> +				    resource_size_t split_addr)

I'd call this "reparent_child_resources". But actually the function is 
weird. Because you only reparents some resources from old to now.

Two questions:

a) Is split_addr really required. Couldn't we derive that from "old_parent"

b) Could we somehow make it clearer (variable names etc) that we are 
only reparenting from "left" to "right" (maybe we can find better names).

Something like

/*
  * Reparent all child resources that no longer belong to "low" after
  * a split to "high". Note that "high" does not have any children,
  * because "low" is the adjusted original resource and "high" is a new
  * resource.
  */
static void reparent_children_after_split(struct resource *low,
		struct resource *high)

Again, maybe we can find better names for left/right low/high.

[...]

>    *
>    * Note:
>    * - Additional release conditions, such as overlapping region, can be
>    *   supported after they are confirmed as valid cases.
> - * - When a busy memory resource gets split into two entries, the code
> - *   assumes that all children remain in the lower address entry for
> - *   simplicity.  Enhance this logic when necessary.
> + * - When a busy memory resource gets split into two entries, its children is

s/is/are/

> + *   reassigned to the correct parent based on their range. If a child memory
> + *   resource overlaps with more than one parent, enhance the logic as needed.
>    */
>   void release_mem_region_adjustable(resource_size_t start, resource_size_t size)
>   {
> @@ -1482,6 +1515,7 @@ void release_mem_region_adjustable(resource_size_t start, resource_size_t size)
>   			new_res->parent = res->parent;
>   			new_res->sibling = res->sibling;
>   			new_res->child = NULL;
> +			move_children_to_parent(res, new_res, end);
>   
>   			if (WARN_ON_ONCE(__adjust_resource(res, res->start,
>   							   start - res->start)))


-- 
Cheers

David / dhildenb

Re: [PATCH] resource: Improve child resource handling in release_mem_region_adjustable()
Posted by Sumanth Korikkar 2 weeks, 6 days ago
> > lsmem
> > RANGE                                  SIZE   STATE REMOVABLE  BLOCK
> > 0x0000000000000000-0x000000014fffffff  5.3G  online       yes   0-41
> > 0x0000000150000000-0x0000000157ffffff  128M offline               42
> > 0x0000000158000000-0x00000002ffffffff  6.6G  online       yes  43-95
> > 
> 
> First of all
> 
> 1) How are you triggering this :)
> 
> 2) Why do you say "and removing the range" when it's still listed in lsmem
> output.
> 
> lsmem will only list present memory block devices. So if it's still listed
> there, nothing was removed. Or is that prior to actually removing it.
> 
> 
> Is this some powerpc dlpar use case?

Hi David,

I am working on the item related to the last discussion -  dynamic
runtime (de)configuration of memory on s390:
https://lore.kernel.org/all/aCx-SJdHd-3Z12af@li-2b55cdcc-350b-11b2-a85c-a78bff51fc11.ibm.com/

I came across the problem when I tried to offline and remove the memory
via /sys/firmware/memory interface.

I have also modified lsmem (not yet upstream) to list deconfigured
memory, which currently appears as offline. An additional "configured"
column is also introduced to show the configuration state, but it is not
displayed here yet (without --output-all).

> > 0x0000000150000000-0x0000000157ffffff  128M offline               42

True, this will not be shown with the master lsmem, since the sysfs
entry is removed after deconfiguration.

> Do we need a Fixes: and CC stable?

It will reference commit 825f787bb496 ("resource: add
release_mem_region_adjustable()"). Since the commit already states
"enhance this logic when necessary," I did not add a Fixes tag.

> > Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
> > ---
> >   kernel/resource.c | 44 +++++++++++++++++++++++++++++++++++++++-----
> >   1 file changed, 39 insertions(+), 5 deletions(-)
> > 
> > diff --git a/kernel/resource.c b/kernel/resource.c
> > index f9bb5481501a..c329b8a4aa2f 100644
> > --- a/kernel/resource.c
> > +++ b/kernel/resource.c
> > @@ -1388,6 +1388,41 @@ void __release_region(struct resource *parent, resource_size_t start,
> >   EXPORT_SYMBOL(__release_region);
> >   #ifdef CONFIG_MEMORY_HOTREMOVE
> > +static void append_child_to_parent(struct resource *new_parent, struct resource *new_child)
> > +{
> > +	struct resource *child;
> > +
> > +	child = new_parent->child;
> > +	if (child) {
> > +		while (child->sibling)
> > +			child = child->sibling;
> > +		child->sibling = new_child;
> 
> Shouldn't we take care of the address ordering here? I guess this works
> because we process them in left-to-right (lowest-to-highest) address.

__request_resource() adds the child resources in the increasing order.
With that, we dont need to check the ordering again here.  True, here we
process the child resources from lowest to highest address.

> > +	} else {
> > +		new_parent->child = new_child;
> > +	}
> > +	new_child->parent = new_parent;
> > +	new_child->sibling = NULL;
> > +}
> > +
> > +static void move_children_to_parent(struct resource *old_parent,
> > +				    struct resource *new_parent,
> > +				    resource_size_t split_addr)
> 
> I'd call this "reparent_child_resources". But actually the function is
> weird. Because you only reparents some resources from old to now.
> 
> Two questions:
> 
> a) Is split_addr really required. Couldn't we derive that from "old_parent"

old_parent->end points to old end range before the split, so I think it
doesnt tell where the split boundary is, until __adjust_resource() is
called. Hence, split_addr was added.

> b) Could we somehow make it clearer (variable names etc) that we are only
> reparenting from "left" to "right" (maybe we can find better names).
> 
> Something like
> 
> /*
>  * Reparent all child resources that no longer belong to "low" after
>  * a split to "high". Note that "high" does not have any children,
>  * because "low" is the adjusted original resource and "high" is a new
>  * resource.
>  */
> static void reparent_children_after_split(struct resource *low,
> 		struct resource *high)

Nice. I will use this convention with split_addr.

> >    *
> >    * Note:
> >    * - Additional release conditions, such as overlapping region, can be
> >    *   supported after they are confirmed as valid cases.
> > - * - When a busy memory resource gets split into two entries, the code
> > - *   assumes that all children remain in the lower address entry for
> > - *   simplicity.  Enhance this logic when necessary.
> > + * - When a busy memory resource gets split into two entries, its children is
> 
> s/is/are/

Will correct it. Thank you
Re: [PATCH] resource: Improve child resource handling in release_mem_region_adjustable()
Posted by David Hildenbrand 2 weeks, 6 days ago
> 
> Hi David,

Hi!

> 
> I am working on the item related to the last discussion -  dynamic
> runtime (de)configuration of memory on s390:
> https://lore.kernel.org/all/aCx-SJdHd-3Z12af@li-2b55cdcc-350b-11b2-a85c-a78bff51fc11.ibm.com/
> 
> I came across the problem when I tried to offline and remove the memory
> via /sys/firmware/memory interface.

Ah, that makes sense. Sorry that I didn't immediately connect the dots 
when seeing your name.

> 
> I have also modified lsmem (not yet upstream) to list deconfigured
> memory, which currently appears as offline. An additional "configured"
> column is also introduced to show the configuration state, but it is not
> displayed here yet (without --output-all).

Worth mentioning in the patch description, otherwise it's confusing.

> 
>>> 0x0000000150000000-0x0000000157ffffff  128M offline               42
> 
> True, this will not be shown with the master lsmem, since the sysfs
> entry is removed after deconfiguration.
> 
>> Do we need a Fixes: and CC stable?
> 
> It will reference commit 825f787bb496 ("resource: add
> release_mem_region_adjustable()"). Since the commit already states
> "enhance this logic when necessary," I did not add a Fixes tag.

So if this cannot be triggered yet, all good and no need for Fixes:.

I was assuming that maybe this can be triggered with ppc dlpar, so I was 
concerned.

> 
>>> Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
>>> ---
>>>    kernel/resource.c | 44 +++++++++++++++++++++++++++++++++++++++-----
>>>    1 file changed, 39 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/kernel/resource.c b/kernel/resource.c
>>> index f9bb5481501a..c329b8a4aa2f 100644
>>> --- a/kernel/resource.c
>>> +++ b/kernel/resource.c
>>> @@ -1388,6 +1388,41 @@ void __release_region(struct resource *parent, resource_size_t start,
>>>    EXPORT_SYMBOL(__release_region);
>>>    #ifdef CONFIG_MEMORY_HOTREMOVE
>>> +static void append_child_to_parent(struct resource *new_parent, struct resource *new_child)
>>> +{
>>> +	struct resource *child;
>>> +
>>> +	child = new_parent->child;
>>> +	if (child) {
>>> +		while (child->sibling)
>>> +			child = child->sibling;
>>> +		child->sibling = new_child;
>>
>> Shouldn't we take care of the address ordering here? I guess this works
>> because we process them in left-to-right (lowest-to-highest) address.
> 
> __request_resource() adds the child resources in the increasing order.
> With that, we dont need to check the ordering again here.  True, here we
> process the child resources from lowest to highest address.
> 
>>> +	} else {
>>> +		new_parent->child = new_child;
>>> +	}
>>> +	new_child->parent = new_parent;
>>> +	new_child->sibling = NULL;
>>> +}
>>> +
>>> +static void move_children_to_parent(struct resource *old_parent,
>>> +				    struct resource *new_parent,
>>> +				    resource_size_t split_addr)
>>
>> I'd call this "reparent_child_resources". But actually the function is
>> weird. Because you only reparents some resources from old to now.
>>
>> Two questions:
>>
>> a) Is split_addr really required. Couldn't we derive that from "old_parent"
> 
> old_parent->end points to old end range before the split, so I think it
> doesnt tell where the split boundary is, until __adjust_resource() is
> called. Hence, split_addr was added.

Makes sense, that's also where the sanity checks happen.

Worth throwing in a comment for the function telling that lower was not 
adjusted yet.

-- 
Cheers

David / dhildenb