1
From: William Roche <william.roche@oracle.com>
1
From: William Roche <william.roche@oracle.com>
2
2
3
Hello David,
3
Hello David,
4
4
5
I'm keeping the description of the patch set you already reviewed:
5
Here is the version with the small nits corrected.
6
And the 'Acked-by' entries you gave me for patch 1 and 2.
7
6
---
8
---
7
This set of patches fixes several problems with hardware memory errors
9
This set of patches fixes several problems with hardware memory errors
8
impacting hugetlbfs memory backed VMs and the generic memory recovery
10
impacting hugetlbfs memory backed VMs and the generic memory recovery
9
on VM reset.
11
on VM reset.
10
When using hugetlbfs large pages, any large page location being impacted
12
When using hugetlbfs large pages, any large page location being impacted
...
...
36
We also enrich the messages used to report a memory error relayed to
38
We also enrich the messages used to report a memory error relayed to
37
the VM, providing an identification of memory page and its size in
39
the VM, providing an identification of memory page and its size in
38
case of a large page impacted.
40
case of a large page impacted.
39
----
41
----
40
42
41
v4->v5
43
v1 -> v2:
44
. I removed the kernel SIGBUS siginfo provided lsb size information
45
tracking. Only relying on the RAMBlock page_size instead.
46
. I adapted the 3 patches you indicated me to implement the
47
notification mechanism on remap. Thank you for this code!
48
I left them as Authored by you.
49
But I haven't tested if the policy setting works as expected on VM
50
reset, only that the replacement of physical memory works.
51
. I also removed the old memory setting that was kept in qemu_ram_remap()
52
but this small last fix could probably be merged with your last commit.
53
54
v2 -> v3:
55
. dropped the size parameter from qemu_ram_remap() and determine the page
56
size when adding it to the poison list, aligning the offset down to the
57
pagesize. Multiple sub-pages poisoned on a large page lead to a single
58
poison entry.
59
. introduction of a helper function for the mmap code
60
. adding "on lost large page <size>@<ram_addr>" to the error injection
61
msg (notation used in qemu_ram_remap() too ).
62
So only in the case of a large page, it looks like:
63
Guest MCE Memory Error at QEMU addr 0x7fc1f5dd6000 and GUEST addr 0x19fd6000 on lost large page 200000@19e00000 of type BUS_MCEERR_AR injected
64
. as we need the page_size value for the above message, I retrieve the
65
value in kvm_arch_on_sigbus_vcpu() to pass the appropriate pointer
66
to kvm_hwpoison_page_add() that doesn't need to align it anymore.
67
. added a similar message for the ARM platform (removing the MCE
68
keyword)
69
. I also introduced a "fail hard" in the remap notification:
70
host_memory_backend_ram_remapped()
71
72
v3 -> v4:
73
. Fixed some commit messages typos
74
. Enhanced some code comments
75
. Changed the discard fall back conditions to consider only anonymous
76
memory
77
. Fixed missing some variable name changes in intermediary patches.
78
. Modify the error message given when an error is injected to report
79
the case of a large page
80
. use snprintf() to generate this message
81
. Adding this same type of message in the ARM case too
82
83
v4->v5:
42
. Updated commit messages (for patches 1, 5 and 6)
84
. Updated commit messages (for patches 1, 5 and 6)
43
. Fixed comment typo of patch 2
85
. Fixed comment typo of patch 2
44
. Changed the fall back function parameters to match the
86
. Changed the fall back function parameters to match the
45
ram_block_discard_range() function.
87
ram_block_discard_range() function.
46
. Removed the unused case of remapping a file in this function
88
. Removed the unused case of remapping a file in this function
47
. add the assert(block->fd < 0) in this function too
89
. add the assert(block->fd < 0) in this function too
48
. I merged my patch 7 with you patch 6 (we only have 6 patches now)
90
. I merged my patch 7 with your patch 6 (we only have 6 patches now)
91
92
v5->v6:
93
. don't align down ram_addr on kvm_hwpoison_page_add() but create
94
a new entry for each subpage reported as poisoned
95
. introduce similar messages about memory error as discard_range()
96
. introduce a function to retrieve more information about a RAMBlock
97
experiencing an error than just its associated page size
98
. file offset as an uint64_t instead of a ram_addr_t
99
. changed ownership of patch 6/6
100
101
v6->v7:
102
. change the block location information collection function name to
103
qemu_ram_block_info_from_addr()
104
. display the fd_offset value only when dealing with a file backend
105
in kvm_hwpoison_page_add() and qemu_ram_remap()
106
. better placed offset alignment computation
107
. two empty separation lines missing
49
108
50
This code is scripts/checkpatch.pl clean
109
This code is scripts/checkpatch.pl clean
51
'make check' runs clean on both x86 and ARM.
110
'make check' runs clean on both x86 and ARM.
52
111
53
112
54
David Hildenbrand (3):
113
David Hildenbrand (2):
55
numa: Introduce and use ram_block_notify_remap()
114
numa: Introduce and use ram_block_notify_remap()
56
hostmem: Factor out applying settings
115
hostmem: Factor out applying settings
57
hostmem: Handle remapping of RAM
58
116
59
William Roche (3):
117
William Roche (4):
60
system/physmem: handle hugetlb correctly in qemu_ram_remap()
118
system/physmem: handle hugetlb correctly in qemu_ram_remap()
61
system/physmem: poisoned memory discard on reboot
119
system/physmem: poisoned memory discard on reboot
62
accel/kvm: Report the loss of a large memory page
120
accel/kvm: Report the loss of a large memory page
121
hostmem: Handle remapping of RAM
63
122
64
accel/kvm/kvm-all.c | 2 +-
123
accel/kvm/kvm-all.c | 20 +++-
65
backends/hostmem.c | 189 +++++++++++++++++++++++---------------
124
backends/hostmem.c | 189 +++++++++++++++++++++++---------------
66
hw/core/numa.c | 11 +++
125
hw/core/numa.c | 11 +++
67
include/exec/cpu-common.h | 3 +-
126
include/exec/cpu-common.h | 12 ++-
68
include/exec/ramlist.h | 3 +
127
include/exec/ramlist.h | 3 +
69
include/system/hostmem.h | 1 +
128
include/system/hostmem.h | 1 +
70
system/physmem.c | 82 ++++++++++++-----
129
system/physmem.c | 107 +++++++++++++++------
71
target/arm/kvm.c | 13 +++
130
target/arm/kvm.c | 3 +
72
target/i386/kvm/kvm.c | 18 +++-
131
8 files changed, 244 insertions(+), 102 deletions(-)
73
9 files changed, 218 insertions(+), 104 deletions(-)
74
132
75
--
133
--
76
2.43.5
134
2.43.5
diff view generated by jsdifflib
1
From: William Roche <william.roche@oracle.com>
1
From: William Roche <william.roche@oracle.com>
2
2
3
The list of hwpoison pages used to remap the memory on reset
3
The list of hwpoison pages used to remap the memory on reset
4
is based on the backend real page size. When dealing with
4
is based on the backend real page size.
5
hugepages, we create a single entry for the entire page.
6
7
To correctly handle hugetlb, we must mmap(MAP_FIXED) a complete
5
To correctly handle hugetlb, we must mmap(MAP_FIXED) a complete
8
hugetlb page; hugetlb pages cannot be partially mapped.
6
hugetlb page; hugetlb pages cannot be partially mapped.
9
7
8
Signed-off-by: William Roche <william.roche@oracle.com>
10
Co-developed-by: David Hildenbrand <david@redhat.com>
9
Co-developed-by: David Hildenbrand <david@redhat.com>
11
Signed-off-by: William Roche <william.roche@oracle.com>
10
Acked-by: David Hildenbrand <david@redhat.com>
12
---
11
---
13
accel/kvm/kvm-all.c | 6 +++++-
12
accel/kvm/kvm-all.c | 2 +-
14
include/exec/cpu-common.h | 3 ++-
13
include/exec/cpu-common.h | 2 +-
15
system/physmem.c | 32 ++++++++++++++++++++++++++------
14
system/physmem.c | 38 +++++++++++++++++++++++++++++---------
16
3 files changed, 33 insertions(+), 8 deletions(-)
15
3 files changed, 31 insertions(+), 11 deletions(-)
17
16
18
diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
17
diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
19
index XXXXXXX..XXXXXXX 100644
18
index XXXXXXX..XXXXXXX 100644
20
--- a/accel/kvm/kvm-all.c
19
--- a/accel/kvm/kvm-all.c
21
+++ b/accel/kvm/kvm-all.c
20
+++ b/accel/kvm/kvm-all.c
...
...
26
- qemu_ram_remap(page->ram_addr, TARGET_PAGE_SIZE);
25
- qemu_ram_remap(page->ram_addr, TARGET_PAGE_SIZE);
27
+ qemu_ram_remap(page->ram_addr);
26
+ qemu_ram_remap(page->ram_addr);
28
g_free(page);
27
g_free(page);
29
}
28
}
30
}
29
}
31
@@ -XXX,XX +XXX,XX @@ static void kvm_unpoison_all(void *param)
32
void kvm_hwpoison_page_add(ram_addr_t ram_addr)
33
{
34
HWPoisonPage *page;
35
+ size_t page_size = qemu_ram_pagesize_from_addr(ram_addr);
36
+
37
+ if (page_size > TARGET_PAGE_SIZE)
38
+ ram_addr = QEMU_ALIGN_DOWN(ram_addr, page_size);
39
40
QLIST_FOREACH(page, &hwpoison_page_list, list) {
41
if (page->ram_addr == ram_addr) {
42
diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
30
diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
43
index XXXXXXX..XXXXXXX 100644
31
index XXXXXXX..XXXXXXX 100644
44
--- a/include/exec/cpu-common.h
32
--- a/include/exec/cpu-common.h
45
+++ b/include/exec/cpu-common.h
33
+++ b/include/exec/cpu-common.h
46
@@ -XXX,XX +XXX,XX @@ typedef uintptr_t ram_addr_t;
34
@@ -XXX,XX +XXX,XX @@ typedef uintptr_t ram_addr_t;
...
...
50
-void qemu_ram_remap(ram_addr_t addr, ram_addr_t length);
38
-void qemu_ram_remap(ram_addr_t addr, ram_addr_t length);
51
+void qemu_ram_remap(ram_addr_t addr);
39
+void qemu_ram_remap(ram_addr_t addr);
52
/* This should not be used by devices. */
40
/* This should not be used by devices. */
53
ram_addr_t qemu_ram_addr_from_host(void *ptr);
41
ram_addr_t qemu_ram_addr_from_host(void *ptr);
54
ram_addr_t qemu_ram_addr_from_host_nofail(void *ptr);
42
ram_addr_t qemu_ram_addr_from_host_nofail(void *ptr);
55
@@ -XXX,XX +XXX,XX @@ bool qemu_ram_is_named_file(RAMBlock *rb);
56
int qemu_ram_get_fd(RAMBlock *rb);
57
58
size_t qemu_ram_pagesize(RAMBlock *block);
59
+size_t qemu_ram_pagesize_from_addr(ram_addr_t addr);
60
size_t qemu_ram_pagesize_largest(void);
61
62
/**
63
diff --git a/system/physmem.c b/system/physmem.c
43
diff --git a/system/physmem.c b/system/physmem.c
64
index XXXXXXX..XXXXXXX 100644
44
index XXXXXXX..XXXXXXX 100644
65
--- a/system/physmem.c
45
--- a/system/physmem.c
66
+++ b/system/physmem.c
46
+++ b/system/physmem.c
67
@@ -XXX,XX +XXX,XX @@ size_t qemu_ram_pagesize(RAMBlock *rb)
68
return rb->page_size;
69
}
70
71
+/* Return backend real page size used for the given ram_addr */
72
+size_t qemu_ram_pagesize_from_addr(ram_addr_t addr)
73
+{
74
+ RAMBlock *rb;
75
+
76
+ RCU_READ_LOCK_GUARD();
77
+ rb = qemu_get_ram_block(addr);
78
+ if (!rb) {
79
+ return TARGET_PAGE_SIZE;
80
+ }
81
+ return qemu_ram_pagesize(rb);
82
+}
83
+
84
/* Returns the largest size of page in use */
85
size_t qemu_ram_pagesize_largest(void)
86
{
87
@@ -XXX,XX +XXX,XX @@ void qemu_ram_free(RAMBlock *block)
47
@@ -XXX,XX +XXX,XX @@ void qemu_ram_free(RAMBlock *block)
88
}
48
}
89
49
90
#ifndef _WIN32
50
#ifndef _WIN32
91
-void qemu_ram_remap(ram_addr_t addr, ram_addr_t length)
51
-void qemu_ram_remap(ram_addr_t addr, ram_addr_t length)
52
+/*
53
+ * qemu_ram_remap - remap a single RAM page
54
+ *
55
+ * @addr: address in ram_addr_t address space.
56
+ *
57
+ * This function will try remapping a single page of guest RAM identified by
58
+ * @addr, essentially discarding memory to recover from previously poisoned
59
+ * memory (MCE). The page size depends on the RAMBlock (i.e., hugetlb). @addr
60
+ * does not have to point at the start of the page.
61
+ *
62
+ * This function is only to be used during system resets; it will kill the
63
+ * VM if remapping failed.
64
+ */
92
+void qemu_ram_remap(ram_addr_t addr)
65
+void qemu_ram_remap(ram_addr_t addr)
93
{
66
{
94
RAMBlock *block;
67
RAMBlock *block;
95
ram_addr_t offset;
68
- ram_addr_t offset;
69
+ uint64_t offset;
96
int flags;
70
int flags;
97
void *area, *vaddr;
71
void *area, *vaddr;
98
int prot;
72
int prot;
99
+ size_t page_size;
73
+ size_t page_size;
100
74
...
...
119
flags |= MAP_ANONYMOUS;
93
flags |= MAP_ANONYMOUS;
120
- area = mmap(vaddr, length, prot, flags, -1, 0);
94
- area = mmap(vaddr, length, prot, flags, -1, 0);
121
+ area = mmap(vaddr, page_size, prot, flags, -1, 0);
95
+ area = mmap(vaddr, page_size, prot, flags, -1, 0);
122
}
96
}
123
if (area != vaddr) {
97
if (area != vaddr) {
124
error_report("Could not remap addr: "
98
- error_report("Could not remap addr: "
125
RAM_ADDR_FMT "@" RAM_ADDR_FMT "",
99
- RAM_ADDR_FMT "@" RAM_ADDR_FMT "",
126
- length, addr);
100
- length, addr);
127
+ page_size, addr);
101
+ error_report("Could not remap RAM %s:%" PRIx64 "+%" PRIx64
102
+ " +%zx", block->idstr, offset,
103
+ block->fd_offset, page_size);
128
exit(1);
104
exit(1);
129
}
105
}
130
- memory_try_enable_merging(vaddr, length);
106
- memory_try_enable_merging(vaddr, length);
131
- qemu_ram_setup_dump(vaddr, length);
107
- qemu_ram_setup_dump(vaddr, length);
132
+ memory_try_enable_merging(vaddr, page_size);
108
+ memory_try_enable_merging(vaddr, page_size);
...
...
diff view generated by jsdifflib
...
...
6
If the kernel doesn't support the madvise calls used by this function
6
If the kernel doesn't support the madvise calls used by this function
7
and we are dealing with anonymous memory, fall back to remapping the
7
and we are dealing with anonymous memory, fall back to remapping the
8
location(s).
8
location(s).
9
9
10
Signed-off-by: William Roche <william.roche@oracle.com>
10
Signed-off-by: William Roche <william.roche@oracle.com>
11
Acked-by: David Hildenbrand <david@redhat.com>
11
---
12
---
12
system/physmem.c | 57 ++++++++++++++++++++++++++++++------------------
13
system/physmem.c | 58 ++++++++++++++++++++++++++++++------------------
13
1 file changed, 36 insertions(+), 21 deletions(-)
14
1 file changed, 36 insertions(+), 22 deletions(-)
14
15
15
diff --git a/system/physmem.c b/system/physmem.c
16
diff --git a/system/physmem.c b/system/physmem.c
16
index XXXXXXX..XXXXXXX 100644
17
index XXXXXXX..XXXXXXX 100644
17
--- a/system/physmem.c
18
--- a/system/physmem.c
18
+++ b/system/physmem.c
19
+++ b/system/physmem.c
19
@@ -XXX,XX +XXX,XX @@ void qemu_ram_free(RAMBlock *block)
20
@@ -XXX,XX +XXX,XX @@ void qemu_ram_free(RAMBlock *block)
20
}
21
}
21
22
22
#ifndef _WIN32
23
#ifndef _WIN32
23
+/* Simply remap the given VM memory location from start to start+length */
24
+/* Simply remap the given VM memory location from start to start+length */
24
+static void qemu_ram_remap_mmap(RAMBlock *block, uint64_t start, size_t length)
25
+static int qemu_ram_remap_mmap(RAMBlock *block, uint64_t start, size_t length)
25
+{
26
+{
26
+ int flags, prot;
27
+ int flags, prot;
27
+ void *area;
28
+ void *area;
28
+ void *host_startaddr = block->host + start;
29
+ void *host_startaddr = block->host + start;
29
+
30
+
...
...
32
+ flags |= block->flags & RAM_SHARED ? MAP_SHARED : MAP_PRIVATE;
33
+ flags |= block->flags & RAM_SHARED ? MAP_SHARED : MAP_PRIVATE;
33
+ flags |= block->flags & RAM_NORESERVE ? MAP_NORESERVE : 0;
34
+ flags |= block->flags & RAM_NORESERVE ? MAP_NORESERVE : 0;
34
+ prot = PROT_READ;
35
+ prot = PROT_READ;
35
+ prot |= block->flags & RAM_READONLY ? 0 : PROT_WRITE;
36
+ prot |= block->flags & RAM_READONLY ? 0 : PROT_WRITE;
36
+ area = mmap(host_startaddr, length, prot, flags, -1, 0);
37
+ area = mmap(host_startaddr, length, prot, flags, -1, 0);
37
+ if (area != host_startaddr) {
38
+ return area != host_startaddr ? -errno : 0;
38
+ error_report("Could not remap addr: " RAM_ADDR_FMT "@" RAM_ADDR_FMT "",
39
+ length, start);
40
+ exit(1);
41
+ }
42
+}
39
+}
43
+
40
+
44
void qemu_ram_remap(ram_addr_t addr)
41
/*
42
* qemu_ram_remap - remap a single RAM page
43
*
44
@@ -XXX,XX +XXX,XX @@ void qemu_ram_remap(ram_addr_t addr)
45
{
45
{
46
RAMBlock *block;
46
RAMBlock *block;
47
ram_addr_t offset;
47
uint64_t offset;
48
- int flags;
48
- int flags;
49
- void *area, *vaddr;
49
- void *area, *vaddr;
50
- int prot;
50
- int prot;
51
+ void *vaddr;
51
+ void *vaddr;
52
size_t page_size;
52
size_t page_size;
53
53
54
RAMBLOCK_FOREACH(block) {
54
RAMBLOCK_FOREACH(block) {
55
@@ -XXX,XX +XXX,XX @@ void qemu_ram_remap(ram_addr_t addr)
55
@@ -XXX,XX +XXX,XX @@ void qemu_ram_remap(ram_addr_t addr)
56
;
56
} else if (xen_enabled()) {
57
} else if (xen_enabled()) {
57
abort();
58
abort();
58
} else {
59
- } else {
59
- flags = MAP_FIXED;
60
- flags = MAP_FIXED;
60
- flags |= block->flags & RAM_SHARED ?
61
- flags |= block->flags & RAM_SHARED ?
61
- MAP_SHARED : MAP_PRIVATE;
62
- MAP_SHARED : MAP_PRIVATE;
62
- flags |= block->flags & RAM_NORESERVE ? MAP_NORESERVE : 0;
63
- flags |= block->flags & RAM_NORESERVE ? MAP_NORESERVE : 0;
63
- prot = PROT_READ;
64
- prot = PROT_READ;
...
...
68
- } else {
69
- } else {
69
- flags |= MAP_ANONYMOUS;
70
- flags |= MAP_ANONYMOUS;
70
- area = mmap(vaddr, page_size, prot, flags, -1, 0);
71
- area = mmap(vaddr, page_size, prot, flags, -1, 0);
71
- }
72
- }
72
- if (area != vaddr) {
73
- if (area != vaddr) {
73
- error_report("Could not remap addr: "
74
- error_report("Could not remap RAM %s:%" PRIx64 "+%" PRIx64
74
- RAM_ADDR_FMT "@" RAM_ADDR_FMT "",
75
- " +%zx", block->idstr, offset,
75
- page_size, addr);
76
- block->fd_offset, page_size);
76
- exit(1);
77
- exit(1);
77
+ if (ram_block_discard_range(block, offset, page_size) != 0) {
78
+ if (ram_block_discard_range(block, offset, page_size) != 0) {
78
+ /*
79
+ /*
79
+ * Fall back to using mmap() only for anonymous mapping,
80
+ * Fall back to using mmap() only for anonymous mapping,
80
+ * as if a backing file is associated we may not be able
81
+ * as if a backing file is associated we may not be able
81
+ * to recover the memory in all cases.
82
+ * to recover the memory in all cases.
82
+ * So don't take the risk of using only mmap and fail now.
83
+ * So don't take the risk of using only mmap and fail now.
83
+ */
84
+ */
84
+ if (block->fd >= 0) {
85
+ if (block->fd >= 0) {
85
+ error_report("Memory poison recovery failure addr: "
86
+ error_report("Could not remap RAM %s:%" PRIx64 "+%"
86
+ RAM_ADDR_FMT "@" RAM_ADDR_FMT "",
87
+ PRIx64 " +%zx", block->idstr, offset,
87
+ page_size, addr);
88
+ block->fd_offset, page_size);
88
+ exit(1);
89
+ exit(1);
89
+ }
90
+ }
90
+ qemu_ram_remap_mmap(block, offset, page_size);
91
+ if (qemu_ram_remap_mmap(block, offset, page_size) != 0) {
92
+ error_report("Could not remap RAM %s:%" PRIx64 " +%zx",
93
+ block->idstr, offset, page_size);
94
+ exit(1);
95
+ }
91
}
96
}
92
memory_try_enable_merging(vaddr, page_size);
97
memory_try_enable_merging(vaddr, page_size);
93
qemu_ram_setup_dump(vaddr, page_size);
98
qemu_ram_setup_dump(vaddr, page_size);
94
--
99
--
95
2.43.5
100
2.43.5
diff view generated by jsdifflib
1
From: William Roche <william.roche@oracle.com>
1
From: William Roche <william.roche@oracle.com>
2
2
3
In case of a large page impacted by a memory error, enhance
3
In case of a large page impacted by a memory error, provide an
4
the existing Qemu error message which indicates that the error
4
information about the impacted large page before the memory
5
is injected in the VM, adding "on lost large page SIZE@ADDR".
5
error injection message.
6
6
7
Include also a similar message to the ARM platform.
7
This message would also appear on ras enabled ARM platforms, with
8
the introduction of an x86 similar error injection message.
8
9
9
In the case of a large page impacted, we now report:
10
In the case of a large page impacted, we now report:
10
...Memory Error at QEMU addr X and GUEST addr Y on lost large page SIZE@ADDR of type...
11
Memory Error on large page from <backend>:<address>+<fd_offset> +<page_size>
12
13
The +<fd_offset> information is only provided with a file backend.
11
14
12
Signed-off-by: William Roche <william.roche@oracle.com>
15
Signed-off-by: William Roche <william.roche@oracle.com>
13
---
16
---
14
accel/kvm/kvm-all.c | 4 ----
17
accel/kvm/kvm-all.c | 18 ++++++++++++++++++
15
target/arm/kvm.c | 13 +++++++++++++
18
include/exec/cpu-common.h | 10 ++++++++++
16
target/i386/kvm/kvm.c | 18 ++++++++++++++----
19
system/physmem.c | 22 ++++++++++++++++++++++
17
3 files changed, 27 insertions(+), 8 deletions(-)
20
target/arm/kvm.c | 3 +++
21
4 files changed, 53 insertions(+)
18
22
19
diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
23
diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
20
index XXXXXXX..XXXXXXX 100644
24
index XXXXXXX..XXXXXXX 100644
21
--- a/accel/kvm/kvm-all.c
25
--- a/accel/kvm/kvm-all.c
22
+++ b/accel/kvm/kvm-all.c
26
+++ b/accel/kvm/kvm-all.c
23
@@ -XXX,XX +XXX,XX @@ static void kvm_unpoison_all(void *param)
27
@@ -XXX,XX +XXX,XX @@ static void kvm_unpoison_all(void *param)
24
void kvm_hwpoison_page_add(ram_addr_t ram_addr)
28
void kvm_hwpoison_page_add(ram_addr_t ram_addr)
25
{
29
{
26
HWPoisonPage *page;
30
HWPoisonPage *page;
27
- size_t page_size = qemu_ram_pagesize_from_addr(ram_addr);
31
+ struct RAMBlockInfo rb_info;
28
-
32
+
29
- if (page_size > TARGET_PAGE_SIZE)
33
+ if (qemu_ram_block_info_from_addr(ram_addr, &rb_info)) {
30
- ram_addr = QEMU_ALIGN_DOWN(ram_addr, page_size);
34
+ size_t ps = rb_info.page_size;
35
+
36
+ if (ps > TARGET_PAGE_SIZE) {
37
+ uint64_t offset = QEMU_ALIGN_DOWN(ram_addr - rb_info.offset, ps);
38
+
39
+ if (rb_info.fd >= 0) {
40
+ error_report("Memory Error on large page from %s:%" PRIx64
41
+ "+%" PRIx64 " +%zx", rb_info.idstr, offset,
42
+ rb_info.fd_offset, ps);
43
+ } else {
44
+ error_report("Memory Error on large page from %s:%" PRIx64
45
+ " +%zx", rb_info.idstr, offset, ps);
46
+ }
47
+ }
48
+ }
31
49
32
QLIST_FOREACH(page, &hwpoison_page_list, list) {
50
QLIST_FOREACH(page, &hwpoison_page_list, list) {
33
if (page->ram_addr == ram_addr) {
51
if (page->ram_addr == ram_addr) {
52
diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
53
index XXXXXXX..XXXXXXX 100644
54
--- a/include/exec/cpu-common.h
55
+++ b/include/exec/cpu-common.h
56
@@ -XXX,XX +XXX,XX @@ int qemu_ram_get_fd(RAMBlock *rb);
57
size_t qemu_ram_pagesize(RAMBlock *block);
58
size_t qemu_ram_pagesize_largest(void);
59
60
+struct RAMBlockInfo {
61
+ char idstr[256];
62
+ ram_addr_t offset;
63
+ int fd;
64
+ uint64_t fd_offset;
65
+ size_t page_size;
66
+};
67
+bool qemu_ram_block_info_from_addr(ram_addr_t ram_addr,
68
+ struct RAMBlockInfo *block);
69
+
70
/**
71
* cpu_address_space_init:
72
* @cpu: CPU to add this address space to
73
diff --git a/system/physmem.c b/system/physmem.c
74
index XXXXXXX..XXXXXXX 100644
75
--- a/system/physmem.c
76
+++ b/system/physmem.c
77
@@ -XXX,XX +XXX,XX @@ size_t qemu_ram_pagesize_largest(void)
78
return largest;
79
}
80
81
+/* Copy RAMBlock information associated to the given ram_addr location */
82
+bool qemu_ram_block_info_from_addr(ram_addr_t ram_addr,
83
+ struct RAMBlockInfo *b_info)
84
+{
85
+ RAMBlock *rb;
86
+
87
+ assert(b_info);
88
+
89
+ RCU_READ_LOCK_GUARD();
90
+ rb = qemu_get_ram_block(ram_addr);
91
+ if (!rb) {
92
+ return false;
93
+ }
94
+
95
+ pstrcat(b_info->idstr, sizeof(b_info->idstr), rb->idstr);
96
+ b_info->offset = rb->offset;
97
+ b_info->fd = rb->fd;
98
+ b_info->fd_offset = rb->fd_offset;
99
+ b_info->page_size = rb->page_size;
100
+ return true;
101
+}
102
+
103
static int memory_try_enable_merging(void *addr, size_t len)
104
{
105
if (!machine_mem_merge(current_machine)) {
34
diff --git a/target/arm/kvm.c b/target/arm/kvm.c
106
diff --git a/target/arm/kvm.c b/target/arm/kvm.c
35
index XXXXXXX..XXXXXXX 100644
107
index XXXXXXX..XXXXXXX 100644
36
--- a/target/arm/kvm.c
108
--- a/target/arm/kvm.c
37
+++ b/target/arm/kvm.c
109
+++ b/target/arm/kvm.c
38
@@ -XXX,XX +XXX,XX @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
110
@@ -XXX,XX +XXX,XX @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
39
{
40
ram_addr_t ram_addr;
41
hwaddr paddr;
42
+ size_t page_size;
43
+ char lp_msg[54];
44
45
assert(code == BUS_MCEERR_AR || code == BUS_MCEERR_AO);
46
47
@@ -XXX,XX +XXX,XX @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
48
ram_addr = qemu_ram_addr_from_host(addr);
49
if (ram_addr != RAM_ADDR_INVALID &&
50
kvm_physical_memory_addr_from_host(c->kvm_state, addr, &paddr)) {
51
+ page_size = qemu_ram_pagesize_from_addr(ram_addr);
52
+ if (page_size > TARGET_PAGE_SIZE) {
53
+ ram_addr = ROUND_DOWN(ram_addr, page_size);
54
+ snprintf(lp_msg, sizeof(lp_msg), " on lost large page "
55
+ RAM_ADDR_FMT "@" RAM_ADDR_FMT "", page_size, ram_addr);
56
+ } else {
57
+ lp_msg[0] = '\0';
58
+ }
59
kvm_hwpoison_page_add(ram_addr);
60
/*
61
* If this is a BUS_MCEERR_AR, we know we have been called
62
@@ -XXX,XX +XXX,XX @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
63
kvm_cpu_synchronize_state(c);
111
kvm_cpu_synchronize_state(c);
64
if (!acpi_ghes_record_errors(ACPI_HEST_SRC_ID_SEA, paddr)) {
112
if (!acpi_ghes_memory_errors(ACPI_HEST_SRC_ID_SEA, paddr)) {
65
kvm_inject_arm_sea(c);
113
kvm_inject_arm_sea(c);
66
+ error_report("Guest Memory Error at QEMU addr %p and "
114
+ error_report("Guest Memory Error at QEMU addr %p and "
67
+ "GUEST addr 0x%" HWADDR_PRIx "%s of type %s injected",
115
+ "GUEST addr 0x%" HWADDR_PRIx " of type %s injected",
68
+ addr, paddr, lp_msg, "BUS_MCEERR_AR");
116
+ addr, paddr, "BUS_MCEERR_AR");
69
} else {
117
} else {
70
error_report("failed to record the error");
118
error_report("failed to record the error");
71
abort();
119
abort();
72
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
73
index XXXXXXX..XXXXXXX 100644
74
--- a/target/i386/kvm/kvm.c
75
+++ b/target/i386/kvm/kvm.c
76
@@ -XXX,XX +XXX,XX @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
77
CPUX86State *env = &cpu->env;
78
ram_addr_t ram_addr;
79
hwaddr paddr;
80
+ size_t page_size;
81
+ char lp_msg[54];
82
83
/* If we get an action required MCE, it has been injected by KVM
84
* while the VM was running. An action optional MCE instead should
85
@@ -XXX,XX +XXX,XX @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
86
ram_addr = qemu_ram_addr_from_host(addr);
87
if (ram_addr != RAM_ADDR_INVALID &&
88
kvm_physical_memory_addr_from_host(c->kvm_state, addr, &paddr)) {
89
+ page_size = qemu_ram_pagesize_from_addr(ram_addr);
90
+ if (page_size > TARGET_PAGE_SIZE) {
91
+ ram_addr = ROUND_DOWN(ram_addr, page_size);
92
+ snprintf(lp_msg, sizeof(lp_msg), " on lost large page "
93
+ RAM_ADDR_FMT "@" RAM_ADDR_FMT "", page_size, ram_addr);
94
+ } else {
95
+ lp_msg[0] = '\0';
96
+ }
97
kvm_hwpoison_page_add(ram_addr);
98
kvm_mce_inject(cpu, paddr, code);
99
100
@@ -XXX,XX +XXX,XX @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
101
*/
102
if (code == BUS_MCEERR_AR) {
103
error_report("Guest MCE Memory Error at QEMU addr %p and "
104
- "GUEST addr 0x%" HWADDR_PRIx " of type %s injected",
105
- addr, paddr, "BUS_MCEERR_AR");
106
+ "GUEST addr 0x%" HWADDR_PRIx "%s of type %s injected",
107
+ addr, paddr, lp_msg, "BUS_MCEERR_AR");
108
} else {
109
warn_report("Guest MCE Memory Error at QEMU addr %p and "
110
- "GUEST addr 0x%" HWADDR_PRIx " of type %s injected",
111
- addr, paddr, "BUS_MCEERR_AO");
112
+ "GUEST addr 0x%" HWADDR_PRIx "%s of type %s injected",
113
+ addr, paddr, lp_msg, "BUS_MCEERR_AO");
114
}
115
116
return;
117
--
120
--
118
2.43.5
121
2.43.5
diff view generated by jsdifflib
1
From: David Hildenbrand <david@redhat.com>
1
From: David Hildenbrand <david@redhat.com>
2
2
3
Notify registered listeners about the remap at the end of
3
Notify registered listeners about the remap at the end of
4
qemu_ram_remap() so e.g., a memory backend can re-apply its
4
qemu_ram_remap() so e.g., a memory backend can re-apply its
5
settings correctly.
5
settings correctly.
6
6
7
Signed-off-by: David Hildenbrand <david@redhat.com>
7
Signed-off-by: David Hildenbrand <david@redhat.com>
8
Signed-off-by: William Roche <william.roche@oracle.com>
8
Signed-off-by: William Roche <william.roche@oracle.com>
9
---
9
---
10
hw/core/numa.c | 11 +++++++++++
10
hw/core/numa.c | 11 +++++++++++
11
include/exec/ramlist.h | 3 +++
11
include/exec/ramlist.h | 3 +++
12
system/physmem.c | 1 +
12
system/physmem.c | 1 +
13
3 files changed, 15 insertions(+)
13
3 files changed, 15 insertions(+)
14
14
15
diff --git a/hw/core/numa.c b/hw/core/numa.c
15
diff --git a/hw/core/numa.c b/hw/core/numa.c
16
index XXXXXXX..XXXXXXX 100644
16
index XXXXXXX..XXXXXXX 100644
17
--- a/hw/core/numa.c
17
--- a/hw/core/numa.c
18
+++ b/hw/core/numa.c
18
+++ b/hw/core/numa.c
19
@@ -XXX,XX +XXX,XX @@ void ram_block_notify_resize(void *host, size_t old_size, size_t new_size)
19
@@ -XXX,XX +XXX,XX @@ void ram_block_notify_resize(void *host, size_t old_size, size_t new_size)
20
}
20
}
21
}
21
}
22
}
22
}
23
+
23
+
24
+void ram_block_notify_remap(void *host, size_t offset, size_t size)
24
+void ram_block_notify_remap(void *host, size_t offset, size_t size)
25
+{
25
+{
26
+ RAMBlockNotifier *notifier;
26
+ RAMBlockNotifier *notifier;
27
+
27
+
28
+ QLIST_FOREACH(notifier, &ram_list.ramblock_notifiers, next) {
28
+ QLIST_FOREACH(notifier, &ram_list.ramblock_notifiers, next) {
29
+ if (notifier->ram_block_remapped) {
29
+ if (notifier->ram_block_remapped) {
30
+ notifier->ram_block_remapped(notifier, host, offset, size);
30
+ notifier->ram_block_remapped(notifier, host, offset, size);
31
+ }
31
+ }
32
+ }
32
+ }
33
+}
33
+}
34
diff --git a/include/exec/ramlist.h b/include/exec/ramlist.h
34
diff --git a/include/exec/ramlist.h b/include/exec/ramlist.h
35
index XXXXXXX..XXXXXXX 100644
35
index XXXXXXX..XXXXXXX 100644
36
--- a/include/exec/ramlist.h
36
--- a/include/exec/ramlist.h
37
+++ b/include/exec/ramlist.h
37
+++ b/include/exec/ramlist.h
38
@@ -XXX,XX +XXX,XX @@ struct RAMBlockNotifier {
38
@@ -XXX,XX +XXX,XX @@ struct RAMBlockNotifier {
39
size_t max_size);
39
size_t max_size);
40
void (*ram_block_resized)(RAMBlockNotifier *n, void *host, size_t old_size,
40
void (*ram_block_resized)(RAMBlockNotifier *n, void *host, size_t old_size,
41
size_t new_size);
41
size_t new_size);
42
+ void (*ram_block_remapped)(RAMBlockNotifier *n, void *host, size_t offset,
42
+ void (*ram_block_remapped)(RAMBlockNotifier *n, void *host, size_t offset,
43
+ size_t size);
43
+ size_t size);
44
QLIST_ENTRY(RAMBlockNotifier) next;
44
QLIST_ENTRY(RAMBlockNotifier) next;
45
};
45
};
46
46
47
@@ -XXX,XX +XXX,XX @@ void ram_block_notifier_remove(RAMBlockNotifier *n);
47
@@ -XXX,XX +XXX,XX @@ void ram_block_notifier_remove(RAMBlockNotifier *n);
48
void ram_block_notify_add(void *host, size_t size, size_t max_size);
48
void ram_block_notify_add(void *host, size_t size, size_t max_size);
49
void ram_block_notify_remove(void *host, size_t size, size_t max_size);
49
void ram_block_notify_remove(void *host, size_t size, size_t max_size);
50
void ram_block_notify_resize(void *host, size_t old_size, size_t new_size);
50
void ram_block_notify_resize(void *host, size_t old_size, size_t new_size);
51
+void ram_block_notify_remap(void *host, size_t offset, size_t size);
51
+void ram_block_notify_remap(void *host, size_t offset, size_t size);
52
52
53
GString *ram_block_format(void);
53
GString *ram_block_format(void);
54
54
55
diff --git a/system/physmem.c b/system/physmem.c
55
diff --git a/system/physmem.c b/system/physmem.c
56
index XXXXXXX..XXXXXXX 100644
56
index XXXXXXX..XXXXXXX 100644
57
--- a/system/physmem.c
57
--- a/system/physmem.c
58
+++ b/system/physmem.c
58
+++ b/system/physmem.c
59
@@ -XXX,XX +XXX,XX @@ void qemu_ram_remap(ram_addr_t addr)
59
@@ -XXX,XX +XXX,XX @@ void qemu_ram_remap(ram_addr_t addr)
60
}
60
}
61
memory_try_enable_merging(vaddr, page_size);
61
memory_try_enable_merging(vaddr, page_size);
62
qemu_ram_setup_dump(vaddr, page_size);
62
qemu_ram_setup_dump(vaddr, page_size);
63
+ ram_block_notify_remap(block->host, offset, page_size);
63
+ ram_block_notify_remap(block->host, offset, page_size);
64
}
64
}
65
65
66
break;
66
break;
67
--
67
--
68
2.43.5
68
2.43.5
diff view generated by jsdifflib
1
From: David Hildenbrand <david@redhat.com>
1
From: David Hildenbrand <david@redhat.com>
2
2
3
We want to reuse the functionality when remapping RAM.
3
We want to reuse the functionality when remapping RAM.
4
4
5
Signed-off-by: David Hildenbrand <david@redhat.com>
5
Signed-off-by: David Hildenbrand <david@redhat.com>
6
Signed-off-by: William Roche <william.roche@oracle.com>
6
Signed-off-by: William Roche <william.roche@oracle.com>
7
---
7
---
8
backends/hostmem.c | 155 ++++++++++++++++++++++++---------------------
8
backends/hostmem.c | 155 ++++++++++++++++++++++++---------------------
9
1 file changed, 82 insertions(+), 73 deletions(-)
9
1 file changed, 82 insertions(+), 73 deletions(-)
10
10
11
diff --git a/backends/hostmem.c b/backends/hostmem.c
11
diff --git a/backends/hostmem.c b/backends/hostmem.c
12
index XXXXXXX..XXXXXXX 100644
12
index XXXXXXX..XXXXXXX 100644
13
--- a/backends/hostmem.c
13
--- a/backends/hostmem.c
14
+++ b/backends/hostmem.c
14
+++ b/backends/hostmem.c
15
@@ -XXX,XX +XXX,XX @@ QEMU_BUILD_BUG_ON(HOST_MEM_POLICY_BIND != MPOL_BIND);
15
@@ -XXX,XX +XXX,XX @@ QEMU_BUILD_BUG_ON(HOST_MEM_POLICY_BIND != MPOL_BIND);
16
QEMU_BUILD_BUG_ON(HOST_MEM_POLICY_INTERLEAVE != MPOL_INTERLEAVE);
16
QEMU_BUILD_BUG_ON(HOST_MEM_POLICY_INTERLEAVE != MPOL_INTERLEAVE);
17
#endif
17
#endif
18
18
19
+static void host_memory_backend_apply_settings(HostMemoryBackend *backend,
19
+static void host_memory_backend_apply_settings(HostMemoryBackend *backend,
20
+ void *ptr, uint64_t size,
20
+ void *ptr, uint64_t size,
21
+ Error **errp)
21
+ Error **errp)
22
+{
22
+{
23
+ bool async = !phase_check(PHASE_LATE_BACKENDS_CREATED);
23
+ bool async = !phase_check(PHASE_LATE_BACKENDS_CREATED);
24
+
24
+
25
+ if (backend->merge) {
25
+ if (backend->merge) {
26
+ qemu_madvise(ptr, size, QEMU_MADV_MERGEABLE);
26
+ qemu_madvise(ptr, size, QEMU_MADV_MERGEABLE);
27
+ }
27
+ }
28
+ if (!backend->dump) {
28
+ if (!backend->dump) {
29
+ qemu_madvise(ptr, size, QEMU_MADV_DONTDUMP);
29
+ qemu_madvise(ptr, size, QEMU_MADV_DONTDUMP);
30
+ }
30
+ }
31
+#ifdef CONFIG_NUMA
31
+#ifdef CONFIG_NUMA
32
+ unsigned long lastbit = find_last_bit(backend->host_nodes, MAX_NODES);
32
+ unsigned long lastbit = find_last_bit(backend->host_nodes, MAX_NODES);
33
+ /* lastbit == MAX_NODES means maxnode = 0 */
33
+ /* lastbit == MAX_NODES means maxnode = 0 */
34
+ unsigned long maxnode = (lastbit + 1) % (MAX_NODES + 1);
34
+ unsigned long maxnode = (lastbit + 1) % (MAX_NODES + 1);
35
+ /*
35
+ /*
36
+ * Ensure policy won't be ignored in case memory is preallocated
36
+ * Ensure policy won't be ignored in case memory is preallocated
37
+ * before mbind(). note: MPOL_MF_STRICT is ignored on hugepages so
37
+ * before mbind(). note: MPOL_MF_STRICT is ignored on hugepages so
38
+ * this doesn't catch hugepage case.
38
+ * this doesn't catch hugepage case.
39
+ */
39
+ */
40
+ unsigned flags = MPOL_MF_STRICT | MPOL_MF_MOVE;
40
+ unsigned flags = MPOL_MF_STRICT | MPOL_MF_MOVE;
41
+ int mode = backend->policy;
41
+ int mode = backend->policy;
42
+
42
+
43
+ /*
43
+ /*
44
+ * Check for invalid host-nodes and policies and give more verbose
44
+ * Check for invalid host-nodes and policies and give more verbose
45
+ * error messages than mbind().
45
+ * error messages than mbind().
46
+ */
46
+ */
47
+ if (maxnode && backend->policy == MPOL_DEFAULT) {
47
+ if (maxnode && backend->policy == MPOL_DEFAULT) {
48
+ error_setg(errp, "host-nodes must be empty for policy default,"
48
+ error_setg(errp, "host-nodes must be empty for policy default,"
49
+ " or you should explicitly specify a policy other"
49
+ " or you should explicitly specify a policy other"
50
+ " than default");
50
+ " than default");
51
+ return;
51
+ return;
52
+ } else if (maxnode == 0 && backend->policy != MPOL_DEFAULT) {
52
+ } else if (maxnode == 0 && backend->policy != MPOL_DEFAULT) {
53
+ error_setg(errp, "host-nodes must be set for policy %s",
53
+ error_setg(errp, "host-nodes must be set for policy %s",
54
+ HostMemPolicy_str(backend->policy));
54
+ HostMemPolicy_str(backend->policy));
55
+ return;
55
+ return;
56
+ }
56
+ }
57
+
57
+
58
+ /*
58
+ /*
59
+ * We can have up to MAX_NODES nodes, but we need to pass maxnode+1
59
+ * We can have up to MAX_NODES nodes, but we need to pass maxnode+1
60
+ * as argument to mbind() due to an old Linux bug (feature?) which
60
+ * as argument to mbind() due to an old Linux bug (feature?) which
61
+ * cuts off the last specified node. This means backend->host_nodes
61
+ * cuts off the last specified node. This means backend->host_nodes
62
+ * must have MAX_NODES+1 bits available.
62
+ * must have MAX_NODES+1 bits available.
63
+ */
63
+ */
64
+ assert(sizeof(backend->host_nodes) >=
64
+ assert(sizeof(backend->host_nodes) >=
65
+ BITS_TO_LONGS(MAX_NODES + 1) * sizeof(unsigned long));
65
+ BITS_TO_LONGS(MAX_NODES + 1) * sizeof(unsigned long));
66
+ assert(maxnode <= MAX_NODES);
66
+ assert(maxnode <= MAX_NODES);
67
+
67
+
68
+#ifdef HAVE_NUMA_HAS_PREFERRED_MANY
68
+#ifdef HAVE_NUMA_HAS_PREFERRED_MANY
69
+ if (mode == MPOL_PREFERRED && numa_has_preferred_many() > 0) {
69
+ if (mode == MPOL_PREFERRED && numa_has_preferred_many() > 0) {
70
+ /*
70
+ /*
71
+ * Replace with MPOL_PREFERRED_MANY otherwise the mbind() below
71
+ * Replace with MPOL_PREFERRED_MANY otherwise the mbind() below
72
+ * silently picks the first node.
72
+ * silently picks the first node.
73
+ */
73
+ */
74
+ mode = MPOL_PREFERRED_MANY;
74
+ mode = MPOL_PREFERRED_MANY;
75
+ }
75
+ }
76
+#endif
76
+#endif
77
+
77
+
78
+ if (maxnode &&
78
+ if (maxnode &&
79
+ mbind(ptr, size, mode, backend->host_nodes, maxnode + 1, flags)) {
79
+ mbind(ptr, size, mode, backend->host_nodes, maxnode + 1, flags)) {
80
+ if (backend->policy != MPOL_DEFAULT || errno != ENOSYS) {
80
+ if (backend->policy != MPOL_DEFAULT || errno != ENOSYS) {
81
+ error_setg_errno(errp, errno,
81
+ error_setg_errno(errp, errno,
82
+ "cannot bind memory to host NUMA nodes");
82
+ "cannot bind memory to host NUMA nodes");
83
+ return;
83
+ return;
84
+ }
84
+ }
85
+ }
85
+ }
86
+#endif
86
+#endif
87
+ /*
87
+ /*
88
+ * Preallocate memory after the NUMA policy has been instantiated.
88
+ * Preallocate memory after the NUMA policy has been instantiated.
89
+ * This is necessary to guarantee memory is allocated with
89
+ * This is necessary to guarantee memory is allocated with
90
+ * specified NUMA policy in place.
90
+ * specified NUMA policy in place.
91
+ */
91
+ */
92
+ if (backend->prealloc &&
92
+ if (backend->prealloc &&
93
+ !qemu_prealloc_mem(memory_region_get_fd(&backend->mr),
93
+ !qemu_prealloc_mem(memory_region_get_fd(&backend->mr),
94
+ ptr, size, backend->prealloc_threads,
94
+ ptr, size, backend->prealloc_threads,
95
+ backend->prealloc_context, async, errp)) {
95
+ backend->prealloc_context, async, errp)) {
96
+ return;
96
+ return;
97
+ }
97
+ }
98
+}
98
+}
99
+
99
+
100
char *
100
char *
101
host_memory_backend_get_name(HostMemoryBackend *backend)
101
host_memory_backend_get_name(HostMemoryBackend *backend)
102
{
102
{
103
@@ -XXX,XX +XXX,XX @@ host_memory_backend_memory_complete(UserCreatable *uc, Error **errp)
103
@@ -XXX,XX +XXX,XX @@ host_memory_backend_memory_complete(UserCreatable *uc, Error **errp)
104
void *ptr;
104
void *ptr;
105
uint64_t sz;
105
uint64_t sz;
106
size_t pagesize;
106
size_t pagesize;
107
- bool async = !phase_check(PHASE_LATE_BACKENDS_CREATED);
107
- bool async = !phase_check(PHASE_LATE_BACKENDS_CREATED);
108
108
109
if (!bc->alloc) {
109
if (!bc->alloc) {
110
return;
110
return;
111
@@ -XXX,XX +XXX,XX @@ host_memory_backend_memory_complete(UserCreatable *uc, Error **errp)
111
@@ -XXX,XX +XXX,XX @@ host_memory_backend_memory_complete(UserCreatable *uc, Error **errp)
112
return;
112
return;
113
}
113
}
114
114
115
- if (backend->merge) {
115
- if (backend->merge) {
116
- qemu_madvise(ptr, sz, QEMU_MADV_MERGEABLE);
116
- qemu_madvise(ptr, sz, QEMU_MADV_MERGEABLE);
117
- }
117
- }
118
- if (!backend->dump) {
118
- if (!backend->dump) {
119
- qemu_madvise(ptr, sz, QEMU_MADV_DONTDUMP);
119
- qemu_madvise(ptr, sz, QEMU_MADV_DONTDUMP);
120
- }
120
- }
121
-#ifdef CONFIG_NUMA
121
-#ifdef CONFIG_NUMA
122
- unsigned long lastbit = find_last_bit(backend->host_nodes, MAX_NODES);
122
- unsigned long lastbit = find_last_bit(backend->host_nodes, MAX_NODES);
123
- /* lastbit == MAX_NODES means maxnode = 0 */
123
- /* lastbit == MAX_NODES means maxnode = 0 */
124
- unsigned long maxnode = (lastbit + 1) % (MAX_NODES + 1);
124
- unsigned long maxnode = (lastbit + 1) % (MAX_NODES + 1);
125
- /*
125
- /*
126
- * Ensure policy won't be ignored in case memory is preallocated
126
- * Ensure policy won't be ignored in case memory is preallocated
127
- * before mbind(). note: MPOL_MF_STRICT is ignored on hugepages so
127
- * before mbind(). note: MPOL_MF_STRICT is ignored on hugepages so
128
- * this doesn't catch hugepage case.
128
- * this doesn't catch hugepage case.
129
- */
129
- */
130
- unsigned flags = MPOL_MF_STRICT | MPOL_MF_MOVE;
130
- unsigned flags = MPOL_MF_STRICT | MPOL_MF_MOVE;
131
- int mode = backend->policy;
131
- int mode = backend->policy;
132
-
132
-
133
- /* check for invalid host-nodes and policies and give more verbose
133
- /* check for invalid host-nodes and policies and give more verbose
134
- * error messages than mbind(). */
134
- * error messages than mbind(). */
135
- if (maxnode && backend->policy == MPOL_DEFAULT) {
135
- if (maxnode && backend->policy == MPOL_DEFAULT) {
136
- error_setg(errp, "host-nodes must be empty for policy default,"
136
- error_setg(errp, "host-nodes must be empty for policy default,"
137
- " or you should explicitly specify a policy other"
137
- " or you should explicitly specify a policy other"
138
- " than default");
138
- " than default");
139
- return;
139
- return;
140
- } else if (maxnode == 0 && backend->policy != MPOL_DEFAULT) {
140
- } else if (maxnode == 0 && backend->policy != MPOL_DEFAULT) {
141
- error_setg(errp, "host-nodes must be set for policy %s",
141
- error_setg(errp, "host-nodes must be set for policy %s",
142
- HostMemPolicy_str(backend->policy));
142
- HostMemPolicy_str(backend->policy));
143
- return;
143
- return;
144
- }
144
- }
145
-
145
-
146
- /*
146
- /*
147
- * We can have up to MAX_NODES nodes, but we need to pass maxnode+1
147
- * We can have up to MAX_NODES nodes, but we need to pass maxnode+1
148
- * as argument to mbind() due to an old Linux bug (feature?) which
148
- * as argument to mbind() due to an old Linux bug (feature?) which
149
- * cuts off the last specified node. This means backend->host_nodes
149
- * cuts off the last specified node. This means backend->host_nodes
150
- * must have MAX_NODES+1 bits available.
150
- * must have MAX_NODES+1 bits available.
151
- */
151
- */
152
- assert(sizeof(backend->host_nodes) >=
152
- assert(sizeof(backend->host_nodes) >=
153
- BITS_TO_LONGS(MAX_NODES + 1) * sizeof(unsigned long));
153
- BITS_TO_LONGS(MAX_NODES + 1) * sizeof(unsigned long));
154
- assert(maxnode <= MAX_NODES);
154
- assert(maxnode <= MAX_NODES);
155
-
155
-
156
-#ifdef HAVE_NUMA_HAS_PREFERRED_MANY
156
-#ifdef HAVE_NUMA_HAS_PREFERRED_MANY
157
- if (mode == MPOL_PREFERRED && numa_has_preferred_many() > 0) {
157
- if (mode == MPOL_PREFERRED && numa_has_preferred_many() > 0) {
158
- /*
158
- /*
159
- * Replace with MPOL_PREFERRED_MANY otherwise the mbind() below
159
- * Replace with MPOL_PREFERRED_MANY otherwise the mbind() below
160
- * silently picks the first node.
160
- * silently picks the first node.
161
- */
161
- */
162
- mode = MPOL_PREFERRED_MANY;
162
- mode = MPOL_PREFERRED_MANY;
163
- }
163
- }
164
-#endif
164
-#endif
165
-
165
-
166
- if (maxnode &&
166
- if (maxnode &&
167
- mbind(ptr, sz, mode, backend->host_nodes, maxnode + 1, flags)) {
167
- mbind(ptr, sz, mode, backend->host_nodes, maxnode + 1, flags)) {
168
- if (backend->policy != MPOL_DEFAULT || errno != ENOSYS) {
168
- if (backend->policy != MPOL_DEFAULT || errno != ENOSYS) {
169
- error_setg_errno(errp, errno,
169
- error_setg_errno(errp, errno,
170
- "cannot bind memory to host NUMA nodes");
170
- "cannot bind memory to host NUMA nodes");
171
- return;
171
- return;
172
- }
172
- }
173
- }
173
- }
174
-#endif
174
-#endif
175
- /*
175
- /*
176
- * Preallocate memory after the NUMA policy has been instantiated.
176
- * Preallocate memory after the NUMA policy has been instantiated.
177
- * This is necessary to guarantee memory is allocated with
177
- * This is necessary to guarantee memory is allocated with
178
- * specified NUMA policy in place.
178
- * specified NUMA policy in place.
179
- */
179
- */
180
- if (backend->prealloc && !qemu_prealloc_mem(memory_region_get_fd(&backend->mr),
180
- if (backend->prealloc && !qemu_prealloc_mem(memory_region_get_fd(&backend->mr),
181
- ptr, sz,
181
- ptr, sz,
182
- backend->prealloc_threads,
182
- backend->prealloc_threads,
183
- backend->prealloc_context,
183
- backend->prealloc_context,
184
- async, errp)) {
184
- async, errp)) {
185
- return;
185
- return;
186
- }
186
- }
187
+ host_memory_backend_apply_settings(backend, ptr, sz, errp);
187
+ host_memory_backend_apply_settings(backend, ptr, sz, errp);
188
}
188
}
189
189
190
static bool
190
static bool
191
--
191
--
192
2.43.5
192
2.43.5
diff view generated by jsdifflib
1
From: David Hildenbrand <david@redhat.com>
1
From: William Roche <william.roche@oracle.com>
2
2
3
Let's register a RAM block notifier and react on remap notifications.
3
Let's register a RAM block notifier and react on remap notifications.
4
Simply re-apply the settings. Exit if something goes wrong.
4
Simply re-apply the settings. Exit if something goes wrong.
5
5
6
Merging and dump settings are handled by the remap notification
6
Merging and dump settings are handled by the remap notification
7
in addition to memory policy and preallocation.
7
in addition to memory policy and preallocation.
8
8
9
Signed-off-by: David Hildenbrand <david@redhat.com>
9
Co-developed-by: David Hildenbrand <david@redhat.com>
10
Signed-off-by: William Roche <william.roche@oracle.com>
10
Signed-off-by: William Roche <william.roche@oracle.com>
11
---
11
---
12
backends/hostmem.c | 34 ++++++++++++++++++++++++++++++++++
12
backends/hostmem.c | 34 ++++++++++++++++++++++++++++++++++
13
include/system/hostmem.h | 1 +
13
include/system/hostmem.h | 1 +
14
system/physmem.c | 4 ----
14
system/physmem.c | 4 ----
...
...
95
--- a/system/physmem.c
95
--- a/system/physmem.c
96
+++ b/system/physmem.c
96
+++ b/system/physmem.c
97
@@ -XXX,XX +XXX,XX @@ void qemu_ram_remap(ram_addr_t addr)
97
@@ -XXX,XX +XXX,XX @@ void qemu_ram_remap(ram_addr_t addr)
98
{
98
{
99
RAMBlock *block;
99
RAMBlock *block;
100
ram_addr_t offset;
100
uint64_t offset;
101
- void *vaddr;
101
- void *vaddr;
102
size_t page_size;
102
size_t page_size;
103
103
104
RAMBLOCK_FOREACH(block) {
104
RAMBLOCK_FOREACH(block) {
105
@@ -XXX,XX +XXX,XX @@ void qemu_ram_remap(ram_addr_t addr)
105
@@ -XXX,XX +XXX,XX @@ void qemu_ram_remap(ram_addr_t addr)
...
...
109
- vaddr = ramblock_ptr(block, offset);
109
- vaddr = ramblock_ptr(block, offset);
110
if (block->flags & RAM_PREALLOC) {
110
if (block->flags & RAM_PREALLOC) {
111
;
111
;
112
} else if (xen_enabled()) {
112
} else if (xen_enabled()) {
113
@@ -XXX,XX +XXX,XX @@ void qemu_ram_remap(ram_addr_t addr)
113
@@ -XXX,XX +XXX,XX @@ void qemu_ram_remap(ram_addr_t addr)
114
exit(1);
114
}
115
}
115
qemu_ram_remap_mmap(block, offset, page_size);
116
}
116
}
117
- memory_try_enable_merging(vaddr, page_size);
117
- memory_try_enable_merging(vaddr, page_size);
118
- qemu_ram_setup_dump(vaddr, page_size);
118
- qemu_ram_setup_dump(vaddr, page_size);
119
ram_block_notify_remap(block->host, offset, page_size);
119
ram_block_notify_remap(block->host, offset, page_size);
120
}
120
}
121
121
122
--
122
--
123
2.43.5
123
2.43.5
diff view generated by jsdifflib