1 | From: William Roche <william.roche@oracle.com> | 1 | From: William Roche <william.roche@oracle.com> |
---|---|---|---|
2 | 2 | ||
3 | Hello David, | 3 | Hello David, |
4 | 4 | ||
5 | I'm keeping the description of the patch set you already reviewed: | 5 | Here is the version with the small nits corrected. |
6 | And the 'Acked-by' entries you gave me for patch 1 and 2. | ||
7 | |||
6 | --- | 8 | --- |
7 | This set of patches fixes several problems with hardware memory errors | 9 | This set of patches fixes several problems with hardware memory errors |
8 | impacting hugetlbfs memory backed VMs and the generic memory recovery | 10 | impacting hugetlbfs memory backed VMs and the generic memory recovery |
9 | on VM reset. | 11 | on VM reset. |
10 | When using hugetlbfs large pages, any large page location being impacted | 12 | When using hugetlbfs large pages, any large page location being impacted |
... | ... | ||
36 | We also enrich the messages used to report a memory error relayed to | 38 | We also enrich the messages used to report a memory error relayed to |
37 | the VM, providing an identification of memory page and its size in | 39 | the VM, providing an identification of memory page and its size in |
38 | case of a large page impacted. | 40 | case of a large page impacted. |
39 | ---- | 41 | ---- |
40 | 42 | ||
41 | v4->v5 | 43 | v1 -> v2: |
44 | . I removed the kernel SIGBUS siginfo provided lsb size information | ||
45 | tracking. Only relying on the RAMBlock page_size instead. | ||
46 | . I adapted the 3 patches you indicated me to implement the | ||
47 | notification mechanism on remap. Thank you for this code! | ||
48 | I left them as Authored by you. | ||
49 | But I haven't tested if the policy setting works as expected on VM | ||
50 | reset, only that the replacement of physical memory works. | ||
51 | . I also removed the old memory setting that was kept in qemu_ram_remap() | ||
52 | but this small last fix could probably be merged with your last commit. | ||
53 | |||
54 | v2 -> v3: | ||
55 | . dropped the size parameter from qemu_ram_remap() and determine the page | ||
56 | size when adding it to the poison list, aligning the offset down to the | ||
57 | pagesize. Multiple sub-pages poisoned on a large page lead to a single | ||
58 | poison entry. | ||
59 | . introduction of a helper function for the mmap code | ||
60 | . adding "on lost large page <size>@<ram_addr>" to the error injection | ||
61 | msg (notation used in qemu_ram_remap() too ). | ||
62 | So only in the case of a large page, it looks like: | ||
63 | Guest MCE Memory Error at QEMU addr 0x7fc1f5dd6000 and GUEST addr 0x19fd6000 on lost large page 200000@19e00000 of type BUS_MCEERR_AR injected | ||
64 | . as we need the page_size value for the above message, I retrieve the | ||
65 | value in kvm_arch_on_sigbus_vcpu() to pass the appropriate pointer | ||
66 | to kvm_hwpoison_page_add() that doesn't need to align it anymore. | ||
67 | . added a similar message for the ARM platform (removing the MCE | ||
68 | keyword) | ||
69 | . I also introduced a "fail hard" in the remap notification: | ||
70 | host_memory_backend_ram_remapped() | ||
71 | |||
72 | v3 -> v4: | ||
73 | . Fixed some commit messages typos | ||
74 | . Enhanced some code comments | ||
75 | . Changed the discard fall back conditions to consider only anonymous | ||
76 | memory | ||
77 | . Fixed missing some variable name changes in intermediary patches. | ||
78 | . Modify the error message given when an error is injected to report | ||
79 | the case of a large page | ||
80 | . use snprintf() to generate this message | ||
81 | . Adding this same type of message in the ARM case too | ||
82 | |||
83 | v4->v5: | ||
42 | . Updated commit messages (for patches 1, 5 and 6) | 84 | . Updated commit messages (for patches 1, 5 and 6) |
43 | . Fixed comment typo of patch 2 | 85 | . Fixed comment typo of patch 2 |
44 | . Changed the fall back function parameters to match the | 86 | . Changed the fall back function parameters to match the |
45 | ram_block_discard_range() function. | 87 | ram_block_discard_range() function. |
46 | . Removed the unused case of remapping a file in this function | 88 | . Removed the unused case of remapping a file in this function |
47 | . add the assert(block->fd < 0) in this function too | 89 | . add the assert(block->fd < 0) in this function too |
48 | . I merged my patch 7 with you patch 6 (we only have 6 patches now) | 90 | . I merged my patch 7 with your patch 6 (we only have 6 patches now) |
91 | |||
92 | v5->v6: | ||
93 | . don't align down ram_addr on kvm_hwpoison_page_add() but create | ||
94 | a new entry for each subpage reported as poisoned | ||
95 | . introduce similar messages about memory error as discard_range() | ||
96 | . introduce a function to retrieve more information about a RAMBlock | ||
97 | experiencing an error than just its associated page size | ||
98 | . file offset as an uint64_t instead of a ram_addr_t | ||
99 | . changed ownership of patch 6/6 | ||
100 | |||
101 | v6->v7: | ||
102 | . change the block location information collection function name to | ||
103 | qemu_ram_block_info_from_addr() | ||
104 | . display the fd_offset value only when dealing with a file backend | ||
105 | in kvm_hwpoison_page_add() and qemu_ram_remap() | ||
106 | . better placed offset alignment computation | ||
107 | . two empty separation lines missing | ||
49 | 108 | ||
50 | This code is scripts/checkpatch.pl clean | 109 | This code is scripts/checkpatch.pl clean |
51 | 'make check' runs clean on both x86 and ARM. | 110 | 'make check' runs clean on both x86 and ARM. |
52 | 111 | ||
53 | 112 | ||
54 | David Hildenbrand (3): | 113 | David Hildenbrand (2): |
55 | numa: Introduce and use ram_block_notify_remap() | 114 | numa: Introduce and use ram_block_notify_remap() |
56 | hostmem: Factor out applying settings | 115 | hostmem: Factor out applying settings |
57 | hostmem: Handle remapping of RAM | ||
58 | 116 | ||
59 | William Roche (3): | 117 | William Roche (4): |
60 | system/physmem: handle hugetlb correctly in qemu_ram_remap() | 118 | system/physmem: handle hugetlb correctly in qemu_ram_remap() |
61 | system/physmem: poisoned memory discard on reboot | 119 | system/physmem: poisoned memory discard on reboot |
62 | accel/kvm: Report the loss of a large memory page | 120 | accel/kvm: Report the loss of a large memory page |
121 | hostmem: Handle remapping of RAM | ||
63 | 122 | ||
64 | accel/kvm/kvm-all.c | 2 +- | 123 | accel/kvm/kvm-all.c | 20 +++- |
65 | backends/hostmem.c | 189 +++++++++++++++++++++++--------------- | 124 | backends/hostmem.c | 189 +++++++++++++++++++++++--------------- |
66 | hw/core/numa.c | 11 +++ | 125 | hw/core/numa.c | 11 +++ |
67 | include/exec/cpu-common.h | 3 +- | 126 | include/exec/cpu-common.h | 12 ++- |
68 | include/exec/ramlist.h | 3 + | 127 | include/exec/ramlist.h | 3 + |
69 | include/system/hostmem.h | 1 + | 128 | include/system/hostmem.h | 1 + |
70 | system/physmem.c | 82 ++++++++++++----- | 129 | system/physmem.c | 107 +++++++++++++++------ |
71 | target/arm/kvm.c | 13 +++ | 130 | target/arm/kvm.c | 3 + |
72 | target/i386/kvm/kvm.c | 18 +++- | 131 | 8 files changed, 244 insertions(+), 102 deletions(-) |
73 | 9 files changed, 218 insertions(+), 104 deletions(-) | ||
74 | 132 | ||
75 | -- | 133 | -- |
76 | 2.43.5 | 134 | 2.43.5 | diff view generated by jsdifflib |
1 | From: William Roche <william.roche@oracle.com> | 1 | From: William Roche <william.roche@oracle.com> |
---|---|---|---|
2 | 2 | ||
3 | The list of hwpoison pages used to remap the memory on reset | 3 | The list of hwpoison pages used to remap the memory on reset |
4 | is based on the backend real page size. When dealing with | 4 | is based on the backend real page size. |
5 | hugepages, we create a single entry for the entire page. | ||
6 | |||
7 | To correctly handle hugetlb, we must mmap(MAP_FIXED) a complete | 5 | To correctly handle hugetlb, we must mmap(MAP_FIXED) a complete |
8 | hugetlb page; hugetlb pages cannot be partially mapped. | 6 | hugetlb page; hugetlb pages cannot be partially mapped. |
9 | 7 | ||
8 | Signed-off-by: William Roche <william.roche@oracle.com> | ||
10 | Co-developed-by: David Hildenbrand <david@redhat.com> | 9 | Co-developed-by: David Hildenbrand <david@redhat.com> |
11 | Signed-off-by: William Roche <william.roche@oracle.com> | 10 | Acked-by: David Hildenbrand <david@redhat.com> |
12 | --- | 11 | --- |
13 | accel/kvm/kvm-all.c | 6 +++++- | 12 | accel/kvm/kvm-all.c | 2 +- |
14 | include/exec/cpu-common.h | 3 ++- | 13 | include/exec/cpu-common.h | 2 +- |
15 | system/physmem.c | 32 ++++++++++++++++++++++++++------ | 14 | system/physmem.c | 38 +++++++++++++++++++++++++++++--------- |
16 | 3 files changed, 33 insertions(+), 8 deletions(-) | 15 | 3 files changed, 31 insertions(+), 11 deletions(-) |
17 | 16 | ||
18 | diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c | 17 | diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c |
19 | index XXXXXXX..XXXXXXX 100644 | 18 | index XXXXXXX..XXXXXXX 100644 |
20 | --- a/accel/kvm/kvm-all.c | 19 | --- a/accel/kvm/kvm-all.c |
21 | +++ b/accel/kvm/kvm-all.c | 20 | +++ b/accel/kvm/kvm-all.c |
... | ... | ||
26 | - qemu_ram_remap(page->ram_addr, TARGET_PAGE_SIZE); | 25 | - qemu_ram_remap(page->ram_addr, TARGET_PAGE_SIZE); |
27 | + qemu_ram_remap(page->ram_addr); | 26 | + qemu_ram_remap(page->ram_addr); |
28 | g_free(page); | 27 | g_free(page); |
29 | } | 28 | } |
30 | } | 29 | } |
31 | @@ -XXX,XX +XXX,XX @@ static void kvm_unpoison_all(void *param) | ||
32 | void kvm_hwpoison_page_add(ram_addr_t ram_addr) | ||
33 | { | ||
34 | HWPoisonPage *page; | ||
35 | + size_t page_size = qemu_ram_pagesize_from_addr(ram_addr); | ||
36 | + | ||
37 | + if (page_size > TARGET_PAGE_SIZE) | ||
38 | + ram_addr = QEMU_ALIGN_DOWN(ram_addr, page_size); | ||
39 | |||
40 | QLIST_FOREACH(page, &hwpoison_page_list, list) { | ||
41 | if (page->ram_addr == ram_addr) { | ||
42 | diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h | 30 | diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h |
43 | index XXXXXXX..XXXXXXX 100644 | 31 | index XXXXXXX..XXXXXXX 100644 |
44 | --- a/include/exec/cpu-common.h | 32 | --- a/include/exec/cpu-common.h |
45 | +++ b/include/exec/cpu-common.h | 33 | +++ b/include/exec/cpu-common.h |
46 | @@ -XXX,XX +XXX,XX @@ typedef uintptr_t ram_addr_t; | 34 | @@ -XXX,XX +XXX,XX @@ typedef uintptr_t ram_addr_t; |
... | ... | ||
50 | -void qemu_ram_remap(ram_addr_t addr, ram_addr_t length); | 38 | -void qemu_ram_remap(ram_addr_t addr, ram_addr_t length); |
51 | +void qemu_ram_remap(ram_addr_t addr); | 39 | +void qemu_ram_remap(ram_addr_t addr); |
52 | /* This should not be used by devices. */ | 40 | /* This should not be used by devices. */ |
53 | ram_addr_t qemu_ram_addr_from_host(void *ptr); | 41 | ram_addr_t qemu_ram_addr_from_host(void *ptr); |
54 | ram_addr_t qemu_ram_addr_from_host_nofail(void *ptr); | 42 | ram_addr_t qemu_ram_addr_from_host_nofail(void *ptr); |
55 | @@ -XXX,XX +XXX,XX @@ bool qemu_ram_is_named_file(RAMBlock *rb); | ||
56 | int qemu_ram_get_fd(RAMBlock *rb); | ||
57 | |||
58 | size_t qemu_ram_pagesize(RAMBlock *block); | ||
59 | +size_t qemu_ram_pagesize_from_addr(ram_addr_t addr); | ||
60 | size_t qemu_ram_pagesize_largest(void); | ||
61 | |||
62 | /** | ||
63 | diff --git a/system/physmem.c b/system/physmem.c | 43 | diff --git a/system/physmem.c b/system/physmem.c |
64 | index XXXXXXX..XXXXXXX 100644 | 44 | index XXXXXXX..XXXXXXX 100644 |
65 | --- a/system/physmem.c | 45 | --- a/system/physmem.c |
66 | +++ b/system/physmem.c | 46 | +++ b/system/physmem.c |
67 | @@ -XXX,XX +XXX,XX @@ size_t qemu_ram_pagesize(RAMBlock *rb) | ||
68 | return rb->page_size; | ||
69 | } | ||
70 | |||
71 | +/* Return backend real page size used for the given ram_addr */ | ||
72 | +size_t qemu_ram_pagesize_from_addr(ram_addr_t addr) | ||
73 | +{ | ||
74 | + RAMBlock *rb; | ||
75 | + | ||
76 | + RCU_READ_LOCK_GUARD(); | ||
77 | + rb = qemu_get_ram_block(addr); | ||
78 | + if (!rb) { | ||
79 | + return TARGET_PAGE_SIZE; | ||
80 | + } | ||
81 | + return qemu_ram_pagesize(rb); | ||
82 | +} | ||
83 | + | ||
84 | /* Returns the largest size of page in use */ | ||
85 | size_t qemu_ram_pagesize_largest(void) | ||
86 | { | ||
87 | @@ -XXX,XX +XXX,XX @@ void qemu_ram_free(RAMBlock *block) | 47 | @@ -XXX,XX +XXX,XX @@ void qemu_ram_free(RAMBlock *block) |
88 | } | 48 | } |
89 | 49 | ||
90 | #ifndef _WIN32 | 50 | #ifndef _WIN32 |
91 | -void qemu_ram_remap(ram_addr_t addr, ram_addr_t length) | 51 | -void qemu_ram_remap(ram_addr_t addr, ram_addr_t length) |
52 | +/* | ||
53 | + * qemu_ram_remap - remap a single RAM page | ||
54 | + * | ||
55 | + * @addr: address in ram_addr_t address space. | ||
56 | + * | ||
57 | + * This function will try remapping a single page of guest RAM identified by | ||
58 | + * @addr, essentially discarding memory to recover from previously poisoned | ||
59 | + * memory (MCE). The page size depends on the RAMBlock (i.e., hugetlb). @addr | ||
60 | + * does not have to point at the start of the page. | ||
61 | + * | ||
62 | + * This function is only to be used during system resets; it will kill the | ||
63 | + * VM if remapping failed. | ||
64 | + */ | ||
92 | +void qemu_ram_remap(ram_addr_t addr) | 65 | +void qemu_ram_remap(ram_addr_t addr) |
93 | { | 66 | { |
94 | RAMBlock *block; | 67 | RAMBlock *block; |
95 | ram_addr_t offset; | 68 | - ram_addr_t offset; |
69 | + uint64_t offset; | ||
96 | int flags; | 70 | int flags; |
97 | void *area, *vaddr; | 71 | void *area, *vaddr; |
98 | int prot; | 72 | int prot; |
99 | + size_t page_size; | 73 | + size_t page_size; |
100 | 74 | ||
... | ... | ||
119 | flags |= MAP_ANONYMOUS; | 93 | flags |= MAP_ANONYMOUS; |
120 | - area = mmap(vaddr, length, prot, flags, -1, 0); | 94 | - area = mmap(vaddr, length, prot, flags, -1, 0); |
121 | + area = mmap(vaddr, page_size, prot, flags, -1, 0); | 95 | + area = mmap(vaddr, page_size, prot, flags, -1, 0); |
122 | } | 96 | } |
123 | if (area != vaddr) { | 97 | if (area != vaddr) { |
124 | error_report("Could not remap addr: " | 98 | - error_report("Could not remap addr: " |
125 | RAM_ADDR_FMT "@" RAM_ADDR_FMT "", | 99 | - RAM_ADDR_FMT "@" RAM_ADDR_FMT "", |
126 | - length, addr); | 100 | - length, addr); |
127 | + page_size, addr); | 101 | + error_report("Could not remap RAM %s:%" PRIx64 "+%" PRIx64 |
102 | + " +%zx", block->idstr, offset, | ||
103 | + block->fd_offset, page_size); | ||
128 | exit(1); | 104 | exit(1); |
129 | } | 105 | } |
130 | - memory_try_enable_merging(vaddr, length); | 106 | - memory_try_enable_merging(vaddr, length); |
131 | - qemu_ram_setup_dump(vaddr, length); | 107 | - qemu_ram_setup_dump(vaddr, length); |
132 | + memory_try_enable_merging(vaddr, page_size); | 108 | + memory_try_enable_merging(vaddr, page_size); |
... | ... | diff view generated by jsdifflib |
... | ... | ||
---|---|---|---|
6 | If the kernel doesn't support the madvise calls used by this function | 6 | If the kernel doesn't support the madvise calls used by this function |
7 | and we are dealing with anonymous memory, fall back to remapping the | 7 | and we are dealing with anonymous memory, fall back to remapping the |
8 | location(s). | 8 | location(s). |
9 | 9 | ||
10 | Signed-off-by: William Roche <william.roche@oracle.com> | 10 | Signed-off-by: William Roche <william.roche@oracle.com> |
11 | Acked-by: David Hildenbrand <david@redhat.com> | ||
11 | --- | 12 | --- |
12 | system/physmem.c | 57 ++++++++++++++++++++++++++++++------------------ | 13 | system/physmem.c | 58 ++++++++++++++++++++++++++++++------------------ |
13 | 1 file changed, 36 insertions(+), 21 deletions(-) | 14 | 1 file changed, 36 insertions(+), 22 deletions(-) |
14 | 15 | ||
15 | diff --git a/system/physmem.c b/system/physmem.c | 16 | diff --git a/system/physmem.c b/system/physmem.c |
16 | index XXXXXXX..XXXXXXX 100644 | 17 | index XXXXXXX..XXXXXXX 100644 |
17 | --- a/system/physmem.c | 18 | --- a/system/physmem.c |
18 | +++ b/system/physmem.c | 19 | +++ b/system/physmem.c |
19 | @@ -XXX,XX +XXX,XX @@ void qemu_ram_free(RAMBlock *block) | 20 | @@ -XXX,XX +XXX,XX @@ void qemu_ram_free(RAMBlock *block) |
20 | } | 21 | } |
21 | 22 | ||
22 | #ifndef _WIN32 | 23 | #ifndef _WIN32 |
23 | +/* Simply remap the given VM memory location from start to start+length */ | 24 | +/* Simply remap the given VM memory location from start to start+length */ |
24 | +static void qemu_ram_remap_mmap(RAMBlock *block, uint64_t start, size_t length) | 25 | +static int qemu_ram_remap_mmap(RAMBlock *block, uint64_t start, size_t length) |
25 | +{ | 26 | +{ |
26 | + int flags, prot; | 27 | + int flags, prot; |
27 | + void *area; | 28 | + void *area; |
28 | + void *host_startaddr = block->host + start; | 29 | + void *host_startaddr = block->host + start; |
29 | + | 30 | + |
... | ... | ||
32 | + flags |= block->flags & RAM_SHARED ? MAP_SHARED : MAP_PRIVATE; | 33 | + flags |= block->flags & RAM_SHARED ? MAP_SHARED : MAP_PRIVATE; |
33 | + flags |= block->flags & RAM_NORESERVE ? MAP_NORESERVE : 0; | 34 | + flags |= block->flags & RAM_NORESERVE ? MAP_NORESERVE : 0; |
34 | + prot = PROT_READ; | 35 | + prot = PROT_READ; |
35 | + prot |= block->flags & RAM_READONLY ? 0 : PROT_WRITE; | 36 | + prot |= block->flags & RAM_READONLY ? 0 : PROT_WRITE; |
36 | + area = mmap(host_startaddr, length, prot, flags, -1, 0); | 37 | + area = mmap(host_startaddr, length, prot, flags, -1, 0); |
37 | + if (area != host_startaddr) { | 38 | + return area != host_startaddr ? -errno : 0; |
38 | + error_report("Could not remap addr: " RAM_ADDR_FMT "@" RAM_ADDR_FMT "", | ||
39 | + length, start); | ||
40 | + exit(1); | ||
41 | + } | ||
42 | +} | 39 | +} |
43 | + | 40 | + |
44 | void qemu_ram_remap(ram_addr_t addr) | 41 | /* |
42 | * qemu_ram_remap - remap a single RAM page | ||
43 | * | ||
44 | @@ -XXX,XX +XXX,XX @@ void qemu_ram_remap(ram_addr_t addr) | ||
45 | { | 45 | { |
46 | RAMBlock *block; | 46 | RAMBlock *block; |
47 | ram_addr_t offset; | 47 | uint64_t offset; |
48 | - int flags; | 48 | - int flags; |
49 | - void *area, *vaddr; | 49 | - void *area, *vaddr; |
50 | - int prot; | 50 | - int prot; |
51 | + void *vaddr; | 51 | + void *vaddr; |
52 | size_t page_size; | 52 | size_t page_size; |
53 | 53 | ||
54 | RAMBLOCK_FOREACH(block) { | 54 | RAMBLOCK_FOREACH(block) { |
55 | @@ -XXX,XX +XXX,XX @@ void qemu_ram_remap(ram_addr_t addr) | 55 | @@ -XXX,XX +XXX,XX @@ void qemu_ram_remap(ram_addr_t addr) |
56 | ; | ||
56 | } else if (xen_enabled()) { | 57 | } else if (xen_enabled()) { |
57 | abort(); | 58 | abort(); |
58 | } else { | 59 | - } else { |
59 | - flags = MAP_FIXED; | 60 | - flags = MAP_FIXED; |
60 | - flags |= block->flags & RAM_SHARED ? | 61 | - flags |= block->flags & RAM_SHARED ? |
61 | - MAP_SHARED : MAP_PRIVATE; | 62 | - MAP_SHARED : MAP_PRIVATE; |
62 | - flags |= block->flags & RAM_NORESERVE ? MAP_NORESERVE : 0; | 63 | - flags |= block->flags & RAM_NORESERVE ? MAP_NORESERVE : 0; |
63 | - prot = PROT_READ; | 64 | - prot = PROT_READ; |
... | ... | ||
68 | - } else { | 69 | - } else { |
69 | - flags |= MAP_ANONYMOUS; | 70 | - flags |= MAP_ANONYMOUS; |
70 | - area = mmap(vaddr, page_size, prot, flags, -1, 0); | 71 | - area = mmap(vaddr, page_size, prot, flags, -1, 0); |
71 | - } | 72 | - } |
72 | - if (area != vaddr) { | 73 | - if (area != vaddr) { |
73 | - error_report("Could not remap addr: " | 74 | - error_report("Could not remap RAM %s:%" PRIx64 "+%" PRIx64 |
74 | - RAM_ADDR_FMT "@" RAM_ADDR_FMT "", | 75 | - " +%zx", block->idstr, offset, |
75 | - page_size, addr); | 76 | - block->fd_offset, page_size); |
76 | - exit(1); | 77 | - exit(1); |
77 | + if (ram_block_discard_range(block, offset, page_size) != 0) { | 78 | + if (ram_block_discard_range(block, offset, page_size) != 0) { |
78 | + /* | 79 | + /* |
79 | + * Fall back to using mmap() only for anonymous mapping, | 80 | + * Fall back to using mmap() only for anonymous mapping, |
80 | + * as if a backing file is associated we may not be able | 81 | + * as if a backing file is associated we may not be able |
81 | + * to recover the memory in all cases. | 82 | + * to recover the memory in all cases. |
82 | + * So don't take the risk of using only mmap and fail now. | 83 | + * So don't take the risk of using only mmap and fail now. |
83 | + */ | 84 | + */ |
84 | + if (block->fd >= 0) { | 85 | + if (block->fd >= 0) { |
85 | + error_report("Memory poison recovery failure addr: " | 86 | + error_report("Could not remap RAM %s:%" PRIx64 "+%" |
86 | + RAM_ADDR_FMT "@" RAM_ADDR_FMT "", | 87 | + PRIx64 " +%zx", block->idstr, offset, |
87 | + page_size, addr); | 88 | + block->fd_offset, page_size); |
88 | + exit(1); | 89 | + exit(1); |
89 | + } | 90 | + } |
90 | + qemu_ram_remap_mmap(block, offset, page_size); | 91 | + if (qemu_ram_remap_mmap(block, offset, page_size) != 0) { |
92 | + error_report("Could not remap RAM %s:%" PRIx64 " +%zx", | ||
93 | + block->idstr, offset, page_size); | ||
94 | + exit(1); | ||
95 | + } | ||
91 | } | 96 | } |
92 | memory_try_enable_merging(vaddr, page_size); | 97 | memory_try_enable_merging(vaddr, page_size); |
93 | qemu_ram_setup_dump(vaddr, page_size); | 98 | qemu_ram_setup_dump(vaddr, page_size); |
94 | -- | 99 | -- |
95 | 2.43.5 | 100 | 2.43.5 | diff view generated by jsdifflib |
1 | From: William Roche <william.roche@oracle.com> | 1 | From: William Roche <william.roche@oracle.com> |
---|---|---|---|
2 | 2 | ||
3 | In case of a large page impacted by a memory error, enhance | 3 | In case of a large page impacted by a memory error, provide an |
4 | the existing Qemu error message which indicates that the error | 4 | information about the impacted large page before the memory |
5 | is injected in the VM, adding "on lost large page SIZE@ADDR". | 5 | error injection message. |
6 | 6 | ||
7 | Include also a similar message to the ARM platform. | 7 | This message would also appear on ras enabled ARM platforms, with |
8 | the introduction of an x86 similar error injection message. | ||
8 | 9 | ||
9 | In the case of a large page impacted, we now report: | 10 | In the case of a large page impacted, we now report: |
10 | ...Memory Error at QEMU addr X and GUEST addr Y on lost large page SIZE@ADDR of type... | 11 | Memory Error on large page from <backend>:<address>+<fd_offset> +<page_size> |
12 | |||
13 | The +<fd_offset> information is only provided with a file backend. | ||
11 | 14 | ||
12 | Signed-off-by: William Roche <william.roche@oracle.com> | 15 | Signed-off-by: William Roche <william.roche@oracle.com> |
13 | --- | 16 | --- |
14 | accel/kvm/kvm-all.c | 4 ---- | 17 | accel/kvm/kvm-all.c | 18 ++++++++++++++++++ |
15 | target/arm/kvm.c | 13 +++++++++++++ | 18 | include/exec/cpu-common.h | 10 ++++++++++ |
16 | target/i386/kvm/kvm.c | 18 ++++++++++++++---- | 19 | system/physmem.c | 22 ++++++++++++++++++++++ |
17 | 3 files changed, 27 insertions(+), 8 deletions(-) | 20 | target/arm/kvm.c | 3 +++ |
21 | 4 files changed, 53 insertions(+) | ||
18 | 22 | ||
19 | diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c | 23 | diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c |
20 | index XXXXXXX..XXXXXXX 100644 | 24 | index XXXXXXX..XXXXXXX 100644 |
21 | --- a/accel/kvm/kvm-all.c | 25 | --- a/accel/kvm/kvm-all.c |
22 | +++ b/accel/kvm/kvm-all.c | 26 | +++ b/accel/kvm/kvm-all.c |
23 | @@ -XXX,XX +XXX,XX @@ static void kvm_unpoison_all(void *param) | 27 | @@ -XXX,XX +XXX,XX @@ static void kvm_unpoison_all(void *param) |
24 | void kvm_hwpoison_page_add(ram_addr_t ram_addr) | 28 | void kvm_hwpoison_page_add(ram_addr_t ram_addr) |
25 | { | 29 | { |
26 | HWPoisonPage *page; | 30 | HWPoisonPage *page; |
27 | - size_t page_size = qemu_ram_pagesize_from_addr(ram_addr); | 31 | + struct RAMBlockInfo rb_info; |
28 | - | 32 | + |
29 | - if (page_size > TARGET_PAGE_SIZE) | 33 | + if (qemu_ram_block_info_from_addr(ram_addr, &rb_info)) { |
30 | - ram_addr = QEMU_ALIGN_DOWN(ram_addr, page_size); | 34 | + size_t ps = rb_info.page_size; |
35 | + | ||
36 | + if (ps > TARGET_PAGE_SIZE) { | ||
37 | + uint64_t offset = QEMU_ALIGN_DOWN(ram_addr - rb_info.offset, ps); | ||
38 | + | ||
39 | + if (rb_info.fd >= 0) { | ||
40 | + error_report("Memory Error on large page from %s:%" PRIx64 | ||
41 | + "+%" PRIx64 " +%zx", rb_info.idstr, offset, | ||
42 | + rb_info.fd_offset, ps); | ||
43 | + } else { | ||
44 | + error_report("Memory Error on large page from %s:%" PRIx64 | ||
45 | + " +%zx", rb_info.idstr, offset, ps); | ||
46 | + } | ||
47 | + } | ||
48 | + } | ||
31 | 49 | ||
32 | QLIST_FOREACH(page, &hwpoison_page_list, list) { | 50 | QLIST_FOREACH(page, &hwpoison_page_list, list) { |
33 | if (page->ram_addr == ram_addr) { | 51 | if (page->ram_addr == ram_addr) { |
52 | diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h | ||
53 | index XXXXXXX..XXXXXXX 100644 | ||
54 | --- a/include/exec/cpu-common.h | ||
55 | +++ b/include/exec/cpu-common.h | ||
56 | @@ -XXX,XX +XXX,XX @@ int qemu_ram_get_fd(RAMBlock *rb); | ||
57 | size_t qemu_ram_pagesize(RAMBlock *block); | ||
58 | size_t qemu_ram_pagesize_largest(void); | ||
59 | |||
60 | +struct RAMBlockInfo { | ||
61 | + char idstr[256]; | ||
62 | + ram_addr_t offset; | ||
63 | + int fd; | ||
64 | + uint64_t fd_offset; | ||
65 | + size_t page_size; | ||
66 | +}; | ||
67 | +bool qemu_ram_block_info_from_addr(ram_addr_t ram_addr, | ||
68 | + struct RAMBlockInfo *block); | ||
69 | + | ||
70 | /** | ||
71 | * cpu_address_space_init: | ||
72 | * @cpu: CPU to add this address space to | ||
73 | diff --git a/system/physmem.c b/system/physmem.c | ||
74 | index XXXXXXX..XXXXXXX 100644 | ||
75 | --- a/system/physmem.c | ||
76 | +++ b/system/physmem.c | ||
77 | @@ -XXX,XX +XXX,XX @@ size_t qemu_ram_pagesize_largest(void) | ||
78 | return largest; | ||
79 | } | ||
80 | |||
81 | +/* Copy RAMBlock information associated to the given ram_addr location */ | ||
82 | +bool qemu_ram_block_info_from_addr(ram_addr_t ram_addr, | ||
83 | + struct RAMBlockInfo *b_info) | ||
84 | +{ | ||
85 | + RAMBlock *rb; | ||
86 | + | ||
87 | + assert(b_info); | ||
88 | + | ||
89 | + RCU_READ_LOCK_GUARD(); | ||
90 | + rb = qemu_get_ram_block(ram_addr); | ||
91 | + if (!rb) { | ||
92 | + return false; | ||
93 | + } | ||
94 | + | ||
95 | + pstrcat(b_info->idstr, sizeof(b_info->idstr), rb->idstr); | ||
96 | + b_info->offset = rb->offset; | ||
97 | + b_info->fd = rb->fd; | ||
98 | + b_info->fd_offset = rb->fd_offset; | ||
99 | + b_info->page_size = rb->page_size; | ||
100 | + return true; | ||
101 | +} | ||
102 | + | ||
103 | static int memory_try_enable_merging(void *addr, size_t len) | ||
104 | { | ||
105 | if (!machine_mem_merge(current_machine)) { | ||
34 | diff --git a/target/arm/kvm.c b/target/arm/kvm.c | 106 | diff --git a/target/arm/kvm.c b/target/arm/kvm.c |
35 | index XXXXXXX..XXXXXXX 100644 | 107 | index XXXXXXX..XXXXXXX 100644 |
36 | --- a/target/arm/kvm.c | 108 | --- a/target/arm/kvm.c |
37 | +++ b/target/arm/kvm.c | 109 | +++ b/target/arm/kvm.c |
38 | @@ -XXX,XX +XXX,XX @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr) | 110 | @@ -XXX,XX +XXX,XX @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr) |
39 | { | ||
40 | ram_addr_t ram_addr; | ||
41 | hwaddr paddr; | ||
42 | + size_t page_size; | ||
43 | + char lp_msg[54]; | ||
44 | |||
45 | assert(code == BUS_MCEERR_AR || code == BUS_MCEERR_AO); | ||
46 | |||
47 | @@ -XXX,XX +XXX,XX @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr) | ||
48 | ram_addr = qemu_ram_addr_from_host(addr); | ||
49 | if (ram_addr != RAM_ADDR_INVALID && | ||
50 | kvm_physical_memory_addr_from_host(c->kvm_state, addr, &paddr)) { | ||
51 | + page_size = qemu_ram_pagesize_from_addr(ram_addr); | ||
52 | + if (page_size > TARGET_PAGE_SIZE) { | ||
53 | + ram_addr = ROUND_DOWN(ram_addr, page_size); | ||
54 | + snprintf(lp_msg, sizeof(lp_msg), " on lost large page " | ||
55 | + RAM_ADDR_FMT "@" RAM_ADDR_FMT "", page_size, ram_addr); | ||
56 | + } else { | ||
57 | + lp_msg[0] = '\0'; | ||
58 | + } | ||
59 | kvm_hwpoison_page_add(ram_addr); | ||
60 | /* | ||
61 | * If this is a BUS_MCEERR_AR, we know we have been called | ||
62 | @@ -XXX,XX +XXX,XX @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr) | ||
63 | kvm_cpu_synchronize_state(c); | 111 | kvm_cpu_synchronize_state(c); |
64 | if (!acpi_ghes_record_errors(ACPI_HEST_SRC_ID_SEA, paddr)) { | 112 | if (!acpi_ghes_memory_errors(ACPI_HEST_SRC_ID_SEA, paddr)) { |
65 | kvm_inject_arm_sea(c); | 113 | kvm_inject_arm_sea(c); |
66 | + error_report("Guest Memory Error at QEMU addr %p and " | 114 | + error_report("Guest Memory Error at QEMU addr %p and " |
67 | + "GUEST addr 0x%" HWADDR_PRIx "%s of type %s injected", | 115 | + "GUEST addr 0x%" HWADDR_PRIx " of type %s injected", |
68 | + addr, paddr, lp_msg, "BUS_MCEERR_AR"); | 116 | + addr, paddr, "BUS_MCEERR_AR"); |
69 | } else { | 117 | } else { |
70 | error_report("failed to record the error"); | 118 | error_report("failed to record the error"); |
71 | abort(); | 119 | abort(); |
72 | diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c | ||
73 | index XXXXXXX..XXXXXXX 100644 | ||
74 | --- a/target/i386/kvm/kvm.c | ||
75 | +++ b/target/i386/kvm/kvm.c | ||
76 | @@ -XXX,XX +XXX,XX @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr) | ||
77 | CPUX86State *env = &cpu->env; | ||
78 | ram_addr_t ram_addr; | ||
79 | hwaddr paddr; | ||
80 | + size_t page_size; | ||
81 | + char lp_msg[54]; | ||
82 | |||
83 | /* If we get an action required MCE, it has been injected by KVM | ||
84 | * while the VM was running. An action optional MCE instead should | ||
85 | @@ -XXX,XX +XXX,XX @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr) | ||
86 | ram_addr = qemu_ram_addr_from_host(addr); | ||
87 | if (ram_addr != RAM_ADDR_INVALID && | ||
88 | kvm_physical_memory_addr_from_host(c->kvm_state, addr, &paddr)) { | ||
89 | + page_size = qemu_ram_pagesize_from_addr(ram_addr); | ||
90 | + if (page_size > TARGET_PAGE_SIZE) { | ||
91 | + ram_addr = ROUND_DOWN(ram_addr, page_size); | ||
92 | + snprintf(lp_msg, sizeof(lp_msg), " on lost large page " | ||
93 | + RAM_ADDR_FMT "@" RAM_ADDR_FMT "", page_size, ram_addr); | ||
94 | + } else { | ||
95 | + lp_msg[0] = '\0'; | ||
96 | + } | ||
97 | kvm_hwpoison_page_add(ram_addr); | ||
98 | kvm_mce_inject(cpu, paddr, code); | ||
99 | |||
100 | @@ -XXX,XX +XXX,XX @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr) | ||
101 | */ | ||
102 | if (code == BUS_MCEERR_AR) { | ||
103 | error_report("Guest MCE Memory Error at QEMU addr %p and " | ||
104 | - "GUEST addr 0x%" HWADDR_PRIx " of type %s injected", | ||
105 | - addr, paddr, "BUS_MCEERR_AR"); | ||
106 | + "GUEST addr 0x%" HWADDR_PRIx "%s of type %s injected", | ||
107 | + addr, paddr, lp_msg, "BUS_MCEERR_AR"); | ||
108 | } else { | ||
109 | warn_report("Guest MCE Memory Error at QEMU addr %p and " | ||
110 | - "GUEST addr 0x%" HWADDR_PRIx " of type %s injected", | ||
111 | - addr, paddr, "BUS_MCEERR_AO"); | ||
112 | + "GUEST addr 0x%" HWADDR_PRIx "%s of type %s injected", | ||
113 | + addr, paddr, lp_msg, "BUS_MCEERR_AO"); | ||
114 | } | ||
115 | |||
116 | return; | ||
117 | -- | 120 | -- |
118 | 2.43.5 | 121 | 2.43.5 | diff view generated by jsdifflib |
1 | From: David Hildenbrand <david@redhat.com> | 1 | From: David Hildenbrand <david@redhat.com> |
---|---|---|---|
2 | 2 | ||
3 | Notify registered listeners about the remap at the end of | 3 | Notify registered listeners about the remap at the end of |
4 | qemu_ram_remap() so e.g., a memory backend can re-apply its | 4 | qemu_ram_remap() so e.g., a memory backend can re-apply its |
5 | settings correctly. | 5 | settings correctly. |
6 | 6 | ||
7 | Signed-off-by: David Hildenbrand <david@redhat.com> | 7 | Signed-off-by: David Hildenbrand <david@redhat.com> |
8 | Signed-off-by: William Roche <william.roche@oracle.com> | 8 | Signed-off-by: William Roche <william.roche@oracle.com> |
9 | --- | 9 | --- |
10 | hw/core/numa.c | 11 +++++++++++ | 10 | hw/core/numa.c | 11 +++++++++++ |
11 | include/exec/ramlist.h | 3 +++ | 11 | include/exec/ramlist.h | 3 +++ |
12 | system/physmem.c | 1 + | 12 | system/physmem.c | 1 + |
13 | 3 files changed, 15 insertions(+) | 13 | 3 files changed, 15 insertions(+) |
14 | 14 | ||
15 | diff --git a/hw/core/numa.c b/hw/core/numa.c | 15 | diff --git a/hw/core/numa.c b/hw/core/numa.c |
16 | index XXXXXXX..XXXXXXX 100644 | 16 | index XXXXXXX..XXXXXXX 100644 |
17 | --- a/hw/core/numa.c | 17 | --- a/hw/core/numa.c |
18 | +++ b/hw/core/numa.c | 18 | +++ b/hw/core/numa.c |
19 | @@ -XXX,XX +XXX,XX @@ void ram_block_notify_resize(void *host, size_t old_size, size_t new_size) | 19 | @@ -XXX,XX +XXX,XX @@ void ram_block_notify_resize(void *host, size_t old_size, size_t new_size) |
20 | } | 20 | } |
21 | } | 21 | } |
22 | } | 22 | } |
23 | + | 23 | + |
24 | +void ram_block_notify_remap(void *host, size_t offset, size_t size) | 24 | +void ram_block_notify_remap(void *host, size_t offset, size_t size) |
25 | +{ | 25 | +{ |
26 | + RAMBlockNotifier *notifier; | 26 | + RAMBlockNotifier *notifier; |
27 | + | 27 | + |
28 | + QLIST_FOREACH(notifier, &ram_list.ramblock_notifiers, next) { | 28 | + QLIST_FOREACH(notifier, &ram_list.ramblock_notifiers, next) { |
29 | + if (notifier->ram_block_remapped) { | 29 | + if (notifier->ram_block_remapped) { |
30 | + notifier->ram_block_remapped(notifier, host, offset, size); | 30 | + notifier->ram_block_remapped(notifier, host, offset, size); |
31 | + } | 31 | + } |
32 | + } | 32 | + } |
33 | +} | 33 | +} |
34 | diff --git a/include/exec/ramlist.h b/include/exec/ramlist.h | 34 | diff --git a/include/exec/ramlist.h b/include/exec/ramlist.h |
35 | index XXXXXXX..XXXXXXX 100644 | 35 | index XXXXXXX..XXXXXXX 100644 |
36 | --- a/include/exec/ramlist.h | 36 | --- a/include/exec/ramlist.h |
37 | +++ b/include/exec/ramlist.h | 37 | +++ b/include/exec/ramlist.h |
38 | @@ -XXX,XX +XXX,XX @@ struct RAMBlockNotifier { | 38 | @@ -XXX,XX +XXX,XX @@ struct RAMBlockNotifier { |
39 | size_t max_size); | 39 | size_t max_size); |
40 | void (*ram_block_resized)(RAMBlockNotifier *n, void *host, size_t old_size, | 40 | void (*ram_block_resized)(RAMBlockNotifier *n, void *host, size_t old_size, |
41 | size_t new_size); | 41 | size_t new_size); |
42 | + void (*ram_block_remapped)(RAMBlockNotifier *n, void *host, size_t offset, | 42 | + void (*ram_block_remapped)(RAMBlockNotifier *n, void *host, size_t offset, |
43 | + size_t size); | 43 | + size_t size); |
44 | QLIST_ENTRY(RAMBlockNotifier) next; | 44 | QLIST_ENTRY(RAMBlockNotifier) next; |
45 | }; | 45 | }; |
46 | 46 | ||
47 | @@ -XXX,XX +XXX,XX @@ void ram_block_notifier_remove(RAMBlockNotifier *n); | 47 | @@ -XXX,XX +XXX,XX @@ void ram_block_notifier_remove(RAMBlockNotifier *n); |
48 | void ram_block_notify_add(void *host, size_t size, size_t max_size); | 48 | void ram_block_notify_add(void *host, size_t size, size_t max_size); |
49 | void ram_block_notify_remove(void *host, size_t size, size_t max_size); | 49 | void ram_block_notify_remove(void *host, size_t size, size_t max_size); |
50 | void ram_block_notify_resize(void *host, size_t old_size, size_t new_size); | 50 | void ram_block_notify_resize(void *host, size_t old_size, size_t new_size); |
51 | +void ram_block_notify_remap(void *host, size_t offset, size_t size); | 51 | +void ram_block_notify_remap(void *host, size_t offset, size_t size); |
52 | 52 | ||
53 | GString *ram_block_format(void); | 53 | GString *ram_block_format(void); |
54 | 54 | ||
55 | diff --git a/system/physmem.c b/system/physmem.c | 55 | diff --git a/system/physmem.c b/system/physmem.c |
56 | index XXXXXXX..XXXXXXX 100644 | 56 | index XXXXXXX..XXXXXXX 100644 |
57 | --- a/system/physmem.c | 57 | --- a/system/physmem.c |
58 | +++ b/system/physmem.c | 58 | +++ b/system/physmem.c |
59 | @@ -XXX,XX +XXX,XX @@ void qemu_ram_remap(ram_addr_t addr) | 59 | @@ -XXX,XX +XXX,XX @@ void qemu_ram_remap(ram_addr_t addr) |
60 | } | 60 | } |
61 | memory_try_enable_merging(vaddr, page_size); | 61 | memory_try_enable_merging(vaddr, page_size); |
62 | qemu_ram_setup_dump(vaddr, page_size); | 62 | qemu_ram_setup_dump(vaddr, page_size); |
63 | + ram_block_notify_remap(block->host, offset, page_size); | 63 | + ram_block_notify_remap(block->host, offset, page_size); |
64 | } | 64 | } |
65 | 65 | ||
66 | break; | 66 | break; |
67 | -- | 67 | -- |
68 | 2.43.5 | 68 | 2.43.5 | diff view generated by jsdifflib |
1 | From: David Hildenbrand <david@redhat.com> | 1 | From: David Hildenbrand <david@redhat.com> |
---|---|---|---|
2 | 2 | ||
3 | We want to reuse the functionality when remapping RAM. | 3 | We want to reuse the functionality when remapping RAM. |
4 | 4 | ||
5 | Signed-off-by: David Hildenbrand <david@redhat.com> | 5 | Signed-off-by: David Hildenbrand <david@redhat.com> |
6 | Signed-off-by: William Roche <william.roche@oracle.com> | 6 | Signed-off-by: William Roche <william.roche@oracle.com> |
7 | --- | 7 | --- |
8 | backends/hostmem.c | 155 ++++++++++++++++++++++++--------------------- | 8 | backends/hostmem.c | 155 ++++++++++++++++++++++++--------------------- |
9 | 1 file changed, 82 insertions(+), 73 deletions(-) | 9 | 1 file changed, 82 insertions(+), 73 deletions(-) |
10 | 10 | ||
11 | diff --git a/backends/hostmem.c b/backends/hostmem.c | 11 | diff --git a/backends/hostmem.c b/backends/hostmem.c |
12 | index XXXXXXX..XXXXXXX 100644 | 12 | index XXXXXXX..XXXXXXX 100644 |
13 | --- a/backends/hostmem.c | 13 | --- a/backends/hostmem.c |
14 | +++ b/backends/hostmem.c | 14 | +++ b/backends/hostmem.c |
15 | @@ -XXX,XX +XXX,XX @@ QEMU_BUILD_BUG_ON(HOST_MEM_POLICY_BIND != MPOL_BIND); | 15 | @@ -XXX,XX +XXX,XX @@ QEMU_BUILD_BUG_ON(HOST_MEM_POLICY_BIND != MPOL_BIND); |
16 | QEMU_BUILD_BUG_ON(HOST_MEM_POLICY_INTERLEAVE != MPOL_INTERLEAVE); | 16 | QEMU_BUILD_BUG_ON(HOST_MEM_POLICY_INTERLEAVE != MPOL_INTERLEAVE); |
17 | #endif | 17 | #endif |
18 | 18 | ||
19 | +static void host_memory_backend_apply_settings(HostMemoryBackend *backend, | 19 | +static void host_memory_backend_apply_settings(HostMemoryBackend *backend, |
20 | + void *ptr, uint64_t size, | 20 | + void *ptr, uint64_t size, |
21 | + Error **errp) | 21 | + Error **errp) |
22 | +{ | 22 | +{ |
23 | + bool async = !phase_check(PHASE_LATE_BACKENDS_CREATED); | 23 | + bool async = !phase_check(PHASE_LATE_BACKENDS_CREATED); |
24 | + | 24 | + |
25 | + if (backend->merge) { | 25 | + if (backend->merge) { |
26 | + qemu_madvise(ptr, size, QEMU_MADV_MERGEABLE); | 26 | + qemu_madvise(ptr, size, QEMU_MADV_MERGEABLE); |
27 | + } | 27 | + } |
28 | + if (!backend->dump) { | 28 | + if (!backend->dump) { |
29 | + qemu_madvise(ptr, size, QEMU_MADV_DONTDUMP); | 29 | + qemu_madvise(ptr, size, QEMU_MADV_DONTDUMP); |
30 | + } | 30 | + } |
31 | +#ifdef CONFIG_NUMA | 31 | +#ifdef CONFIG_NUMA |
32 | + unsigned long lastbit = find_last_bit(backend->host_nodes, MAX_NODES); | 32 | + unsigned long lastbit = find_last_bit(backend->host_nodes, MAX_NODES); |
33 | + /* lastbit == MAX_NODES means maxnode = 0 */ | 33 | + /* lastbit == MAX_NODES means maxnode = 0 */ |
34 | + unsigned long maxnode = (lastbit + 1) % (MAX_NODES + 1); | 34 | + unsigned long maxnode = (lastbit + 1) % (MAX_NODES + 1); |
35 | + /* | 35 | + /* |
36 | + * Ensure policy won't be ignored in case memory is preallocated | 36 | + * Ensure policy won't be ignored in case memory is preallocated |
37 | + * before mbind(). note: MPOL_MF_STRICT is ignored on hugepages so | 37 | + * before mbind(). note: MPOL_MF_STRICT is ignored on hugepages so |
38 | + * this doesn't catch hugepage case. | 38 | + * this doesn't catch hugepage case. |
39 | + */ | 39 | + */ |
40 | + unsigned flags = MPOL_MF_STRICT | MPOL_MF_MOVE; | 40 | + unsigned flags = MPOL_MF_STRICT | MPOL_MF_MOVE; |
41 | + int mode = backend->policy; | 41 | + int mode = backend->policy; |
42 | + | 42 | + |
43 | + /* | 43 | + /* |
44 | + * Check for invalid host-nodes and policies and give more verbose | 44 | + * Check for invalid host-nodes and policies and give more verbose |
45 | + * error messages than mbind(). | 45 | + * error messages than mbind(). |
46 | + */ | 46 | + */ |
47 | + if (maxnode && backend->policy == MPOL_DEFAULT) { | 47 | + if (maxnode && backend->policy == MPOL_DEFAULT) { |
48 | + error_setg(errp, "host-nodes must be empty for policy default," | 48 | + error_setg(errp, "host-nodes must be empty for policy default," |
49 | + " or you should explicitly specify a policy other" | 49 | + " or you should explicitly specify a policy other" |
50 | + " than default"); | 50 | + " than default"); |
51 | + return; | 51 | + return; |
52 | + } else if (maxnode == 0 && backend->policy != MPOL_DEFAULT) { | 52 | + } else if (maxnode == 0 && backend->policy != MPOL_DEFAULT) { |
53 | + error_setg(errp, "host-nodes must be set for policy %s", | 53 | + error_setg(errp, "host-nodes must be set for policy %s", |
54 | + HostMemPolicy_str(backend->policy)); | 54 | + HostMemPolicy_str(backend->policy)); |
55 | + return; | 55 | + return; |
56 | + } | 56 | + } |
57 | + | 57 | + |
58 | + /* | 58 | + /* |
59 | + * We can have up to MAX_NODES nodes, but we need to pass maxnode+1 | 59 | + * We can have up to MAX_NODES nodes, but we need to pass maxnode+1 |
60 | + * as argument to mbind() due to an old Linux bug (feature?) which | 60 | + * as argument to mbind() due to an old Linux bug (feature?) which |
61 | + * cuts off the last specified node. This means backend->host_nodes | 61 | + * cuts off the last specified node. This means backend->host_nodes |
62 | + * must have MAX_NODES+1 bits available. | 62 | + * must have MAX_NODES+1 bits available. |
63 | + */ | 63 | + */ |
64 | + assert(sizeof(backend->host_nodes) >= | 64 | + assert(sizeof(backend->host_nodes) >= |
65 | + BITS_TO_LONGS(MAX_NODES + 1) * sizeof(unsigned long)); | 65 | + BITS_TO_LONGS(MAX_NODES + 1) * sizeof(unsigned long)); |
66 | + assert(maxnode <= MAX_NODES); | 66 | + assert(maxnode <= MAX_NODES); |
67 | + | 67 | + |
68 | +#ifdef HAVE_NUMA_HAS_PREFERRED_MANY | 68 | +#ifdef HAVE_NUMA_HAS_PREFERRED_MANY |
69 | + if (mode == MPOL_PREFERRED && numa_has_preferred_many() > 0) { | 69 | + if (mode == MPOL_PREFERRED && numa_has_preferred_many() > 0) { |
70 | + /* | 70 | + /* |
71 | + * Replace with MPOL_PREFERRED_MANY otherwise the mbind() below | 71 | + * Replace with MPOL_PREFERRED_MANY otherwise the mbind() below |
72 | + * silently picks the first node. | 72 | + * silently picks the first node. |
73 | + */ | 73 | + */ |
74 | + mode = MPOL_PREFERRED_MANY; | 74 | + mode = MPOL_PREFERRED_MANY; |
75 | + } | 75 | + } |
76 | +#endif | 76 | +#endif |
77 | + | 77 | + |
78 | + if (maxnode && | 78 | + if (maxnode && |
79 | + mbind(ptr, size, mode, backend->host_nodes, maxnode + 1, flags)) { | 79 | + mbind(ptr, size, mode, backend->host_nodes, maxnode + 1, flags)) { |
80 | + if (backend->policy != MPOL_DEFAULT || errno != ENOSYS) { | 80 | + if (backend->policy != MPOL_DEFAULT || errno != ENOSYS) { |
81 | + error_setg_errno(errp, errno, | 81 | + error_setg_errno(errp, errno, |
82 | + "cannot bind memory to host NUMA nodes"); | 82 | + "cannot bind memory to host NUMA nodes"); |
83 | + return; | 83 | + return; |
84 | + } | 84 | + } |
85 | + } | 85 | + } |
86 | +#endif | 86 | +#endif |
87 | + /* | 87 | + /* |
88 | + * Preallocate memory after the NUMA policy has been instantiated. | 88 | + * Preallocate memory after the NUMA policy has been instantiated. |
89 | + * This is necessary to guarantee memory is allocated with | 89 | + * This is necessary to guarantee memory is allocated with |
90 | + * specified NUMA policy in place. | 90 | + * specified NUMA policy in place. |
91 | + */ | 91 | + */ |
92 | + if (backend->prealloc && | 92 | + if (backend->prealloc && |
93 | + !qemu_prealloc_mem(memory_region_get_fd(&backend->mr), | 93 | + !qemu_prealloc_mem(memory_region_get_fd(&backend->mr), |
94 | + ptr, size, backend->prealloc_threads, | 94 | + ptr, size, backend->prealloc_threads, |
95 | + backend->prealloc_context, async, errp)) { | 95 | + backend->prealloc_context, async, errp)) { |
96 | + return; | 96 | + return; |
97 | + } | 97 | + } |
98 | +} | 98 | +} |
99 | + | 99 | + |
100 | char * | 100 | char * |
101 | host_memory_backend_get_name(HostMemoryBackend *backend) | 101 | host_memory_backend_get_name(HostMemoryBackend *backend) |
102 | { | 102 | { |
103 | @@ -XXX,XX +XXX,XX @@ host_memory_backend_memory_complete(UserCreatable *uc, Error **errp) | 103 | @@ -XXX,XX +XXX,XX @@ host_memory_backend_memory_complete(UserCreatable *uc, Error **errp) |
104 | void *ptr; | 104 | void *ptr; |
105 | uint64_t sz; | 105 | uint64_t sz; |
106 | size_t pagesize; | 106 | size_t pagesize; |
107 | - bool async = !phase_check(PHASE_LATE_BACKENDS_CREATED); | 107 | - bool async = !phase_check(PHASE_LATE_BACKENDS_CREATED); |
108 | 108 | ||
109 | if (!bc->alloc) { | 109 | if (!bc->alloc) { |
110 | return; | 110 | return; |
111 | @@ -XXX,XX +XXX,XX @@ host_memory_backend_memory_complete(UserCreatable *uc, Error **errp) | 111 | @@ -XXX,XX +XXX,XX @@ host_memory_backend_memory_complete(UserCreatable *uc, Error **errp) |
112 | return; | 112 | return; |
113 | } | 113 | } |
114 | 114 | ||
115 | - if (backend->merge) { | 115 | - if (backend->merge) { |
116 | - qemu_madvise(ptr, sz, QEMU_MADV_MERGEABLE); | 116 | - qemu_madvise(ptr, sz, QEMU_MADV_MERGEABLE); |
117 | - } | 117 | - } |
118 | - if (!backend->dump) { | 118 | - if (!backend->dump) { |
119 | - qemu_madvise(ptr, sz, QEMU_MADV_DONTDUMP); | 119 | - qemu_madvise(ptr, sz, QEMU_MADV_DONTDUMP); |
120 | - } | 120 | - } |
121 | -#ifdef CONFIG_NUMA | 121 | -#ifdef CONFIG_NUMA |
122 | - unsigned long lastbit = find_last_bit(backend->host_nodes, MAX_NODES); | 122 | - unsigned long lastbit = find_last_bit(backend->host_nodes, MAX_NODES); |
123 | - /* lastbit == MAX_NODES means maxnode = 0 */ | 123 | - /* lastbit == MAX_NODES means maxnode = 0 */ |
124 | - unsigned long maxnode = (lastbit + 1) % (MAX_NODES + 1); | 124 | - unsigned long maxnode = (lastbit + 1) % (MAX_NODES + 1); |
125 | - /* | 125 | - /* |
126 | - * Ensure policy won't be ignored in case memory is preallocated | 126 | - * Ensure policy won't be ignored in case memory is preallocated |
127 | - * before mbind(). note: MPOL_MF_STRICT is ignored on hugepages so | 127 | - * before mbind(). note: MPOL_MF_STRICT is ignored on hugepages so |
128 | - * this doesn't catch hugepage case. | 128 | - * this doesn't catch hugepage case. |
129 | - */ | 129 | - */ |
130 | - unsigned flags = MPOL_MF_STRICT | MPOL_MF_MOVE; | 130 | - unsigned flags = MPOL_MF_STRICT | MPOL_MF_MOVE; |
131 | - int mode = backend->policy; | 131 | - int mode = backend->policy; |
132 | - | 132 | - |
133 | - /* check for invalid host-nodes and policies and give more verbose | 133 | - /* check for invalid host-nodes and policies and give more verbose |
134 | - * error messages than mbind(). */ | 134 | - * error messages than mbind(). */ |
135 | - if (maxnode && backend->policy == MPOL_DEFAULT) { | 135 | - if (maxnode && backend->policy == MPOL_DEFAULT) { |
136 | - error_setg(errp, "host-nodes must be empty for policy default," | 136 | - error_setg(errp, "host-nodes must be empty for policy default," |
137 | - " or you should explicitly specify a policy other" | 137 | - " or you should explicitly specify a policy other" |
138 | - " than default"); | 138 | - " than default"); |
139 | - return; | 139 | - return; |
140 | - } else if (maxnode == 0 && backend->policy != MPOL_DEFAULT) { | 140 | - } else if (maxnode == 0 && backend->policy != MPOL_DEFAULT) { |
141 | - error_setg(errp, "host-nodes must be set for policy %s", | 141 | - error_setg(errp, "host-nodes must be set for policy %s", |
142 | - HostMemPolicy_str(backend->policy)); | 142 | - HostMemPolicy_str(backend->policy)); |
143 | - return; | 143 | - return; |
144 | - } | 144 | - } |
145 | - | 145 | - |
146 | - /* | 146 | - /* |
147 | - * We can have up to MAX_NODES nodes, but we need to pass maxnode+1 | 147 | - * We can have up to MAX_NODES nodes, but we need to pass maxnode+1 |
148 | - * as argument to mbind() due to an old Linux bug (feature?) which | 148 | - * as argument to mbind() due to an old Linux bug (feature?) which |
149 | - * cuts off the last specified node. This means backend->host_nodes | 149 | - * cuts off the last specified node. This means backend->host_nodes |
150 | - * must have MAX_NODES+1 bits available. | 150 | - * must have MAX_NODES+1 bits available. |
151 | - */ | 151 | - */ |
152 | - assert(sizeof(backend->host_nodes) >= | 152 | - assert(sizeof(backend->host_nodes) >= |
153 | - BITS_TO_LONGS(MAX_NODES + 1) * sizeof(unsigned long)); | 153 | - BITS_TO_LONGS(MAX_NODES + 1) * sizeof(unsigned long)); |
154 | - assert(maxnode <= MAX_NODES); | 154 | - assert(maxnode <= MAX_NODES); |
155 | - | 155 | - |
156 | -#ifdef HAVE_NUMA_HAS_PREFERRED_MANY | 156 | -#ifdef HAVE_NUMA_HAS_PREFERRED_MANY |
157 | - if (mode == MPOL_PREFERRED && numa_has_preferred_many() > 0) { | 157 | - if (mode == MPOL_PREFERRED && numa_has_preferred_many() > 0) { |
158 | - /* | 158 | - /* |
159 | - * Replace with MPOL_PREFERRED_MANY otherwise the mbind() below | 159 | - * Replace with MPOL_PREFERRED_MANY otherwise the mbind() below |
160 | - * silently picks the first node. | 160 | - * silently picks the first node. |
161 | - */ | 161 | - */ |
162 | - mode = MPOL_PREFERRED_MANY; | 162 | - mode = MPOL_PREFERRED_MANY; |
163 | - } | 163 | - } |
164 | -#endif | 164 | -#endif |
165 | - | 165 | - |
166 | - if (maxnode && | 166 | - if (maxnode && |
167 | - mbind(ptr, sz, mode, backend->host_nodes, maxnode + 1, flags)) { | 167 | - mbind(ptr, sz, mode, backend->host_nodes, maxnode + 1, flags)) { |
168 | - if (backend->policy != MPOL_DEFAULT || errno != ENOSYS) { | 168 | - if (backend->policy != MPOL_DEFAULT || errno != ENOSYS) { |
169 | - error_setg_errno(errp, errno, | 169 | - error_setg_errno(errp, errno, |
170 | - "cannot bind memory to host NUMA nodes"); | 170 | - "cannot bind memory to host NUMA nodes"); |
171 | - return; | 171 | - return; |
172 | - } | 172 | - } |
173 | - } | 173 | - } |
174 | -#endif | 174 | -#endif |
175 | - /* | 175 | - /* |
176 | - * Preallocate memory after the NUMA policy has been instantiated. | 176 | - * Preallocate memory after the NUMA policy has been instantiated. |
177 | - * This is necessary to guarantee memory is allocated with | 177 | - * This is necessary to guarantee memory is allocated with |
178 | - * specified NUMA policy in place. | 178 | - * specified NUMA policy in place. |
179 | - */ | 179 | - */ |
180 | - if (backend->prealloc && !qemu_prealloc_mem(memory_region_get_fd(&backend->mr), | 180 | - if (backend->prealloc && !qemu_prealloc_mem(memory_region_get_fd(&backend->mr), |
181 | - ptr, sz, | 181 | - ptr, sz, |
182 | - backend->prealloc_threads, | 182 | - backend->prealloc_threads, |
183 | - backend->prealloc_context, | 183 | - backend->prealloc_context, |
184 | - async, errp)) { | 184 | - async, errp)) { |
185 | - return; | 185 | - return; |
186 | - } | 186 | - } |
187 | + host_memory_backend_apply_settings(backend, ptr, sz, errp); | 187 | + host_memory_backend_apply_settings(backend, ptr, sz, errp); |
188 | } | 188 | } |
189 | 189 | ||
190 | static bool | 190 | static bool |
191 | -- | 191 | -- |
192 | 2.43.5 | 192 | 2.43.5 | diff view generated by jsdifflib |
1 | From: David Hildenbrand <david@redhat.com> | 1 | From: William Roche <william.roche@oracle.com> |
---|---|---|---|
2 | 2 | ||
3 | Let's register a RAM block notifier and react on remap notifications. | 3 | Let's register a RAM block notifier and react on remap notifications. |
4 | Simply re-apply the settings. Exit if something goes wrong. | 4 | Simply re-apply the settings. Exit if something goes wrong. |
5 | 5 | ||
6 | Merging and dump settings are handled by the remap notification | 6 | Merging and dump settings are handled by the remap notification |
7 | in addition to memory policy and preallocation. | 7 | in addition to memory policy and preallocation. |
8 | 8 | ||
9 | Signed-off-by: David Hildenbrand <david@redhat.com> | 9 | Co-developed-by: David Hildenbrand <david@redhat.com> |
10 | Signed-off-by: William Roche <william.roche@oracle.com> | 10 | Signed-off-by: William Roche <william.roche@oracle.com> |
11 | --- | 11 | --- |
12 | backends/hostmem.c | 34 ++++++++++++++++++++++++++++++++++ | 12 | backends/hostmem.c | 34 ++++++++++++++++++++++++++++++++++ |
13 | include/system/hostmem.h | 1 + | 13 | include/system/hostmem.h | 1 + |
14 | system/physmem.c | 4 ---- | 14 | system/physmem.c | 4 ---- |
... | ... | ||
95 | --- a/system/physmem.c | 95 | --- a/system/physmem.c |
96 | +++ b/system/physmem.c | 96 | +++ b/system/physmem.c |
97 | @@ -XXX,XX +XXX,XX @@ void qemu_ram_remap(ram_addr_t addr) | 97 | @@ -XXX,XX +XXX,XX @@ void qemu_ram_remap(ram_addr_t addr) |
98 | { | 98 | { |
99 | RAMBlock *block; | 99 | RAMBlock *block; |
100 | ram_addr_t offset; | 100 | uint64_t offset; |
101 | - void *vaddr; | 101 | - void *vaddr; |
102 | size_t page_size; | 102 | size_t page_size; |
103 | 103 | ||
104 | RAMBLOCK_FOREACH(block) { | 104 | RAMBLOCK_FOREACH(block) { |
105 | @@ -XXX,XX +XXX,XX @@ void qemu_ram_remap(ram_addr_t addr) | 105 | @@ -XXX,XX +XXX,XX @@ void qemu_ram_remap(ram_addr_t addr) |
... | ... | ||
109 | - vaddr = ramblock_ptr(block, offset); | 109 | - vaddr = ramblock_ptr(block, offset); |
110 | if (block->flags & RAM_PREALLOC) { | 110 | if (block->flags & RAM_PREALLOC) { |
111 | ; | 111 | ; |
112 | } else if (xen_enabled()) { | 112 | } else if (xen_enabled()) { |
113 | @@ -XXX,XX +XXX,XX @@ void qemu_ram_remap(ram_addr_t addr) | 113 | @@ -XXX,XX +XXX,XX @@ void qemu_ram_remap(ram_addr_t addr) |
114 | exit(1); | ||
114 | } | 115 | } |
115 | qemu_ram_remap_mmap(block, offset, page_size); | ||
116 | } | 116 | } |
117 | - memory_try_enable_merging(vaddr, page_size); | 117 | - memory_try_enable_merging(vaddr, page_size); |
118 | - qemu_ram_setup_dump(vaddr, page_size); | 118 | - qemu_ram_setup_dump(vaddr, page_size); |
119 | ram_block_notify_remap(block->host, offset, page_size); | 119 | ram_block_notify_remap(block->host, offset, page_size); |
120 | } | 120 | } |
121 | 121 | ||
122 | -- | 122 | -- |
123 | 2.43.5 | 123 | 2.43.5 | diff view generated by jsdifflib |