1
From: William Roche <william.roche@oracle.com>
1
From: William Roche <william.roche@oracle.com>
2
2
3
Hello David,
3
Hello David,
4
4
5
I'm back on this topic.
5
Here is the version with the small nits corrected.
6
And the 'Acked-by' entries you gave me for patch 1 and 2.
7
6
---
8
---
7
This set of patches fixes several problems with hardware memory errors
9
This set of patches fixes several problems with hardware memory errors
8
impacting hugetlbfs memory backed VMs and the generic memory recovery
10
impacting hugetlbfs memory backed VMs and the generic memory recovery
9
on VM reset.
11
on VM reset.
10
When using hugetlbfs large pages, any large page location being impacted
12
When using hugetlbfs large pages, any large page location being impacted
...
...
94
. introduce a function to retrieve more information about a RAMBlock
96
. introduce a function to retrieve more information about a RAMBlock
95
experiencing an error than just its associated page size
97
experiencing an error than just its associated page size
96
. file offset as an uint64_t instead of a ram_addr_t
98
. file offset as an uint64_t instead of a ram_addr_t
97
. changed ownership of patch 6/6
99
. changed ownership of patch 6/6
98
100
101
v6->v7:
102
. change the block location information collection function name to
103
qemu_ram_block_info_from_addr()
104
. display the fd_offset value only when dealing with a file backend
105
in kvm_hwpoison_page_add() and qemu_ram_remap()
106
. better placed offset alignment computation
107
. two empty separation lines missing
108
109
This code is scripts/checkpatch.pl clean
110
'make check' runs clean on both x86 and ARM.
111
99
112
100
David Hildenbrand (2):
113
David Hildenbrand (2):
101
numa: Introduce and use ram_block_notify_remap()
114
numa: Introduce and use ram_block_notify_remap()
102
hostmem: Factor out applying settings
115
hostmem: Factor out applying settings
103
116
104
William Roche (4):
117
William Roche (4):
105
system/physmem: handle hugetlb correctly in qemu_ram_remap()
118
system/physmem: handle hugetlb correctly in qemu_ram_remap()
106
system/physmem: poisoned memory discard on reboot
119
system/physmem: poisoned memory discard on reboot
107
accel/kvm: Report the loss of a large memory page
120
accel/kvm: Report the loss of a large memory page
108
hostmem: Handle remapping of RAM
121
hostmem: Handle remapping of RAM
109
122
110
accel/kvm/kvm-all.c | 13 ++-
123
accel/kvm/kvm-all.c | 20 +++-
111
backends/hostmem.c | 189 +++++++++++++++++++++++---------------
124
backends/hostmem.c | 189 +++++++++++++++++++++++---------------
112
hw/core/numa.c | 11 +++
125
hw/core/numa.c | 11 +++
113
include/exec/cpu-common.h | 11 ++-
126
include/exec/cpu-common.h | 12 ++-
114
include/exec/ramlist.h | 3 +
127
include/exec/ramlist.h | 3 +
115
include/system/hostmem.h | 1 +
128
include/system/hostmem.h | 1 +
116
system/physmem.c | 102 ++++++++++++++------
129
system/physmem.c | 107 +++++++++++++++------
117
target/arm/kvm.c | 3 +
130
target/arm/kvm.c | 3 +
118
8 files changed, 231 insertions(+), 102 deletions(-)
131
8 files changed, 244 insertions(+), 102 deletions(-)
119
132
120
--
133
--
121
2.43.5
134
2.43.5
diff view generated by jsdifflib
...
...
3
The list of hwpoison pages used to remap the memory on reset
3
The list of hwpoison pages used to remap the memory on reset
4
is based on the backend real page size.
4
is based on the backend real page size.
5
To correctly handle hugetlb, we must mmap(MAP_FIXED) a complete
5
To correctly handle hugetlb, we must mmap(MAP_FIXED) a complete
6
hugetlb page; hugetlb pages cannot be partially mapped.
6
hugetlb page; hugetlb pages cannot be partially mapped.
7
7
8
Signed-off-by: William Roche <william.roche@oracle.com>
8
Co-developed-by: David Hildenbrand <david@redhat.com>
9
Co-developed-by: David Hildenbrand <david@redhat.com>
9
Signed-off-by: William Roche <william.roche@oracle.com>
10
Acked-by: David Hildenbrand <david@redhat.com>
10
---
11
---
11
accel/kvm/kvm-all.c | 2 +-
12
accel/kvm/kvm-all.c | 2 +-
12
include/exec/cpu-common.h | 2 +-
13
include/exec/cpu-common.h | 2 +-
13
system/physmem.c | 38 +++++++++++++++++++++++++++++---------
14
system/physmem.c | 38 +++++++++++++++++++++++++++++---------
14
3 files changed, 31 insertions(+), 11 deletions(-)
15
3 files changed, 31 insertions(+), 11 deletions(-)
...
...
diff view generated by jsdifflib
...
...
6
If the kernel doesn't support the madvise calls used by this function
6
If the kernel doesn't support the madvise calls used by this function
7
and we are dealing with anonymous memory, fall back to remapping the
7
and we are dealing with anonymous memory, fall back to remapping the
8
location(s).
8
location(s).
9
9
10
Signed-off-by: William Roche <william.roche@oracle.com>
10
Signed-off-by: William Roche <william.roche@oracle.com>
11
Acked-by: David Hildenbrand <david@redhat.com>
11
---
12
---
12
system/physmem.c | 54 ++++++++++++++++++++++++++++--------------------
13
system/physmem.c | 58 ++++++++++++++++++++++++++++++------------------
13
1 file changed, 32 insertions(+), 22 deletions(-)
14
1 file changed, 36 insertions(+), 22 deletions(-)
14
15
15
diff --git a/system/physmem.c b/system/physmem.c
16
diff --git a/system/physmem.c b/system/physmem.c
16
index XXXXXXX..XXXXXXX 100644
17
index XXXXXXX..XXXXXXX 100644
17
--- a/system/physmem.c
18
--- a/system/physmem.c
18
+++ b/system/physmem.c
19
+++ b/system/physmem.c
...
...
79
+ * Fall back to using mmap() only for anonymous mapping,
80
+ * Fall back to using mmap() only for anonymous mapping,
80
+ * as if a backing file is associated we may not be able
81
+ * as if a backing file is associated we may not be able
81
+ * to recover the memory in all cases.
82
+ * to recover the memory in all cases.
82
+ * So don't take the risk of using only mmap and fail now.
83
+ * So don't take the risk of using only mmap and fail now.
83
+ */
84
+ */
84
+ if (block->fd >= 0 ||
85
+ if (block->fd >= 0) {
85
+ qemu_ram_remap_mmap(block, offset, page_size) != 0) {
86
+ error_report("Could not remap RAM %s:%" PRIx64 "+%"
86
+ error_report("Could not remap RAM %s:%" PRIx64 "+%"
87
+ PRIx64 " +%zx", block->idstr, offset,
87
+ PRIx64 " +%zx", block->idstr, offset,
88
+ block->fd_offset, page_size);
88
+ block->fd_offset, page_size);
89
+ exit(1);
90
+ }
91
+ if (qemu_ram_remap_mmap(block, offset, page_size) != 0) {
92
+ error_report("Could not remap RAM %s:%" PRIx64 " +%zx",
93
+ block->idstr, offset, page_size);
89
+ exit(1);
94
+ exit(1);
90
+ }
95
+ }
91
}
96
}
92
memory_try_enable_merging(vaddr, page_size);
97
memory_try_enable_merging(vaddr, page_size);
93
qemu_ram_setup_dump(vaddr, page_size);
98
qemu_ram_setup_dump(vaddr, page_size);
94
--
99
--
95
2.43.5
100
2.43.5
diff view generated by jsdifflib
...
...
8
the introduction of an x86 similar error injection message.
8
the introduction of an x86 similar error injection message.
9
9
10
In the case of a large page impacted, we now report:
10
In the case of a large page impacted, we now report:
11
Memory Error on large page from <backend>:<address>+<fd_offset> +<page_size>
11
Memory Error on large page from <backend>:<address>+<fd_offset> +<page_size>
12
12
13
The +<fd_offset> information is only provided with a file backend.
14
13
Signed-off-by: William Roche <william.roche@oracle.com>
15
Signed-off-by: William Roche <william.roche@oracle.com>
14
---
16
---
15
accel/kvm/kvm-all.c | 11 +++++++++++
17
accel/kvm/kvm-all.c | 18 ++++++++++++++++++
16
include/exec/cpu-common.h | 9 +++++++++
18
include/exec/cpu-common.h | 10 ++++++++++
17
system/physmem.c | 21 +++++++++++++++++++++
19
system/physmem.c | 22 ++++++++++++++++++++++
18
target/arm/kvm.c | 3 +++
20
target/arm/kvm.c | 3 +++
19
4 files changed, 44 insertions(+)
21
4 files changed, 53 insertions(+)
20
22
21
diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
23
diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
22
index XXXXXXX..XXXXXXX 100644
24
index XXXXXXX..XXXXXXX 100644
23
--- a/accel/kvm/kvm-all.c
25
--- a/accel/kvm/kvm-all.c
24
+++ b/accel/kvm/kvm-all.c
26
+++ b/accel/kvm/kvm-all.c
25
@@ -XXX,XX +XXX,XX @@ static void kvm_unpoison_all(void *param)
27
@@ -XXX,XX +XXX,XX @@ static void kvm_unpoison_all(void *param)
26
void kvm_hwpoison_page_add(ram_addr_t ram_addr)
28
void kvm_hwpoison_page_add(ram_addr_t ram_addr)
27
{
29
{
28
HWPoisonPage *page;
30
HWPoisonPage *page;
29
+ struct RAMBlockInfo rb_info;
31
+ struct RAMBlockInfo rb_info;
30
+
32
+
31
+ if (qemu_ram_block_location_info_from_addr(ram_addr, &rb_info)) {
33
+ if (qemu_ram_block_info_from_addr(ram_addr, &rb_info)) {
32
+ size_t ps = rb_info.page_size;
34
+ size_t ps = rb_info.page_size;
35
+
33
+ if (ps > TARGET_PAGE_SIZE) {
36
+ if (ps > TARGET_PAGE_SIZE) {
34
+ uint64_t offset = ram_addr - rb_info.offset;
37
+ uint64_t offset = QEMU_ALIGN_DOWN(ram_addr - rb_info.offset, ps);
35
+ error_report("Memory Error on large page from %s:%" PRIx64
38
+
36
+ "+%" PRIx64 " +%zx", rb_info.idstr,
39
+ if (rb_info.fd >= 0) {
37
+ QEMU_ALIGN_DOWN(offset, ps), rb_info.fd_offset, ps);
40
+ error_report("Memory Error on large page from %s:%" PRIx64
41
+ "+%" PRIx64 " +%zx", rb_info.idstr, offset,
42
+ rb_info.fd_offset, ps);
43
+ } else {
44
+ error_report("Memory Error on large page from %s:%" PRIx64
45
+ " +%zx", rb_info.idstr, offset, ps);
46
+ }
38
+ }
47
+ }
39
+ }
48
+ }
40
49
41
QLIST_FOREACH(page, &hwpoison_page_list, list) {
50
QLIST_FOREACH(page, &hwpoison_page_list, list) {
42
if (page->ram_addr == ram_addr) {
51
if (page->ram_addr == ram_addr) {
...
...
49
size_t qemu_ram_pagesize_largest(void);
58
size_t qemu_ram_pagesize_largest(void);
50
59
51
+struct RAMBlockInfo {
60
+struct RAMBlockInfo {
52
+ char idstr[256];
61
+ char idstr[256];
53
+ ram_addr_t offset;
62
+ ram_addr_t offset;
63
+ int fd;
54
+ uint64_t fd_offset;
64
+ uint64_t fd_offset;
55
+ size_t page_size;
65
+ size_t page_size;
56
+};
66
+};
57
+bool qemu_ram_block_location_info_from_addr(ram_addr_t ram_addr,
67
+bool qemu_ram_block_info_from_addr(ram_addr_t ram_addr,
58
+ struct RAMBlockInfo *block);
68
+ struct RAMBlockInfo *block);
59
+
69
+
60
/**
70
/**
61
* cpu_address_space_init:
71
* cpu_address_space_init:
62
* @cpu: CPU to add this address space to
72
* @cpu: CPU to add this address space to
63
diff --git a/system/physmem.c b/system/physmem.c
73
diff --git a/system/physmem.c b/system/physmem.c
...
...
67
@@ -XXX,XX +XXX,XX @@ size_t qemu_ram_pagesize_largest(void)
77
@@ -XXX,XX +XXX,XX @@ size_t qemu_ram_pagesize_largest(void)
68
return largest;
78
return largest;
69
}
79
}
70
80
71
+/* Copy RAMBlock information associated to the given ram_addr location */
81
+/* Copy RAMBlock information associated to the given ram_addr location */
72
+bool qemu_ram_block_location_info_from_addr(ram_addr_t ram_addr,
82
+bool qemu_ram_block_info_from_addr(ram_addr_t ram_addr,
73
+ struct RAMBlockInfo *b_info)
83
+ struct RAMBlockInfo *b_info)
74
+{
84
+{
75
+ RAMBlock *rb;
85
+ RAMBlock *rb;
76
+
86
+
77
+ assert(b_info);
87
+ assert(b_info);
78
+
88
+
...
...
82
+ return false;
92
+ return false;
83
+ }
93
+ }
84
+
94
+
85
+ pstrcat(b_info->idstr, sizeof(b_info->idstr), rb->idstr);
95
+ pstrcat(b_info->idstr, sizeof(b_info->idstr), rb->idstr);
86
+ b_info->offset = rb->offset;
96
+ b_info->offset = rb->offset;
97
+ b_info->fd = rb->fd;
87
+ b_info->fd_offset = rb->fd_offset;
98
+ b_info->fd_offset = rb->fd_offset;
88
+ b_info->page_size = rb->page_size;
99
+ b_info->page_size = rb->page_size;
89
+ return true;
100
+ return true;
90
+}
101
+}
91
+
102
+
...
...
diff view generated by jsdifflib
1
From: David Hildenbrand <david@redhat.com>
1
From: David Hildenbrand <david@redhat.com>
2
2
3
Notify registered listeners about the remap at the end of
3
Notify registered listeners about the remap at the end of
4
qemu_ram_remap() so e.g., a memory backend can re-apply its
4
qemu_ram_remap() so e.g., a memory backend can re-apply its
5
settings correctly.
5
settings correctly.
6
6
7
Signed-off-by: David Hildenbrand <david@redhat.com>
7
Signed-off-by: David Hildenbrand <david@redhat.com>
8
Signed-off-by: William Roche <william.roche@oracle.com>
8
Signed-off-by: William Roche <william.roche@oracle.com>
9
---
9
---
10
hw/core/numa.c | 11 +++++++++++
10
hw/core/numa.c | 11 +++++++++++
11
include/exec/ramlist.h | 3 +++
11
include/exec/ramlist.h | 3 +++
12
system/physmem.c | 1 +
12
system/physmem.c | 1 +
13
3 files changed, 15 insertions(+)
13
3 files changed, 15 insertions(+)
14
14
15
diff --git a/hw/core/numa.c b/hw/core/numa.c
15
diff --git a/hw/core/numa.c b/hw/core/numa.c
16
index XXXXXXX..XXXXXXX 100644
16
index XXXXXXX..XXXXXXX 100644
17
--- a/hw/core/numa.c
17
--- a/hw/core/numa.c
18
+++ b/hw/core/numa.c
18
+++ b/hw/core/numa.c
19
@@ -XXX,XX +XXX,XX @@ void ram_block_notify_resize(void *host, size_t old_size, size_t new_size)
19
@@ -XXX,XX +XXX,XX @@ void ram_block_notify_resize(void *host, size_t old_size, size_t new_size)
20
}
20
}
21
}
21
}
22
}
22
}
23
+
23
+
24
+void ram_block_notify_remap(void *host, size_t offset, size_t size)
24
+void ram_block_notify_remap(void *host, size_t offset, size_t size)
25
+{
25
+{
26
+ RAMBlockNotifier *notifier;
26
+ RAMBlockNotifier *notifier;
27
+
27
+
28
+ QLIST_FOREACH(notifier, &ram_list.ramblock_notifiers, next) {
28
+ QLIST_FOREACH(notifier, &ram_list.ramblock_notifiers, next) {
29
+ if (notifier->ram_block_remapped) {
29
+ if (notifier->ram_block_remapped) {
30
+ notifier->ram_block_remapped(notifier, host, offset, size);
30
+ notifier->ram_block_remapped(notifier, host, offset, size);
31
+ }
31
+ }
32
+ }
32
+ }
33
+}
33
+}
34
diff --git a/include/exec/ramlist.h b/include/exec/ramlist.h
34
diff --git a/include/exec/ramlist.h b/include/exec/ramlist.h
35
index XXXXXXX..XXXXXXX 100644
35
index XXXXXXX..XXXXXXX 100644
36
--- a/include/exec/ramlist.h
36
--- a/include/exec/ramlist.h
37
+++ b/include/exec/ramlist.h
37
+++ b/include/exec/ramlist.h
38
@@ -XXX,XX +XXX,XX @@ struct RAMBlockNotifier {
38
@@ -XXX,XX +XXX,XX @@ struct RAMBlockNotifier {
39
size_t max_size);
39
size_t max_size);
40
void (*ram_block_resized)(RAMBlockNotifier *n, void *host, size_t old_size,
40
void (*ram_block_resized)(RAMBlockNotifier *n, void *host, size_t old_size,
41
size_t new_size);
41
size_t new_size);
42
+ void (*ram_block_remapped)(RAMBlockNotifier *n, void *host, size_t offset,
42
+ void (*ram_block_remapped)(RAMBlockNotifier *n, void *host, size_t offset,
43
+ size_t size);
43
+ size_t size);
44
QLIST_ENTRY(RAMBlockNotifier) next;
44
QLIST_ENTRY(RAMBlockNotifier) next;
45
};
45
};
46
46
47
@@ -XXX,XX +XXX,XX @@ void ram_block_notifier_remove(RAMBlockNotifier *n);
47
@@ -XXX,XX +XXX,XX @@ void ram_block_notifier_remove(RAMBlockNotifier *n);
48
void ram_block_notify_add(void *host, size_t size, size_t max_size);
48
void ram_block_notify_add(void *host, size_t size, size_t max_size);
49
void ram_block_notify_remove(void *host, size_t size, size_t max_size);
49
void ram_block_notify_remove(void *host, size_t size, size_t max_size);
50
void ram_block_notify_resize(void *host, size_t old_size, size_t new_size);
50
void ram_block_notify_resize(void *host, size_t old_size, size_t new_size);
51
+void ram_block_notify_remap(void *host, size_t offset, size_t size);
51
+void ram_block_notify_remap(void *host, size_t offset, size_t size);
52
52
53
GString *ram_block_format(void);
53
GString *ram_block_format(void);
54
54
55
diff --git a/system/physmem.c b/system/physmem.c
55
diff --git a/system/physmem.c b/system/physmem.c
56
index XXXXXXX..XXXXXXX 100644
56
index XXXXXXX..XXXXXXX 100644
57
--- a/system/physmem.c
57
--- a/system/physmem.c
58
+++ b/system/physmem.c
58
+++ b/system/physmem.c
59
@@ -XXX,XX +XXX,XX @@ void qemu_ram_remap(ram_addr_t addr)
59
@@ -XXX,XX +XXX,XX @@ void qemu_ram_remap(ram_addr_t addr)
60
}
60
}
61
memory_try_enable_merging(vaddr, page_size);
61
memory_try_enable_merging(vaddr, page_size);
62
qemu_ram_setup_dump(vaddr, page_size);
62
qemu_ram_setup_dump(vaddr, page_size);
63
+ ram_block_notify_remap(block->host, offset, page_size);
63
+ ram_block_notify_remap(block->host, offset, page_size);
64
}
64
}
65
65
66
break;
66
break;
67
--
67
--
68
2.43.5
68
2.43.5
diff view generated by jsdifflib
1
From: David Hildenbrand <david@redhat.com>
1
From: David Hildenbrand <david@redhat.com>
2
2
3
We want to reuse the functionality when remapping RAM.
3
We want to reuse the functionality when remapping RAM.
4
4
5
Signed-off-by: David Hildenbrand <david@redhat.com>
5
Signed-off-by: David Hildenbrand <david@redhat.com>
6
Signed-off-by: William Roche <william.roche@oracle.com>
6
Signed-off-by: William Roche <william.roche@oracle.com>
7
---
7
---
8
backends/hostmem.c | 155 ++++++++++++++++++++++++---------------------
8
backends/hostmem.c | 155 ++++++++++++++++++++++++---------------------
9
1 file changed, 82 insertions(+), 73 deletions(-)
9
1 file changed, 82 insertions(+), 73 deletions(-)
10
10
11
diff --git a/backends/hostmem.c b/backends/hostmem.c
11
diff --git a/backends/hostmem.c b/backends/hostmem.c
12
index XXXXXXX..XXXXXXX 100644
12
index XXXXXXX..XXXXXXX 100644
13
--- a/backends/hostmem.c
13
--- a/backends/hostmem.c
14
+++ b/backends/hostmem.c
14
+++ b/backends/hostmem.c
15
@@ -XXX,XX +XXX,XX @@ QEMU_BUILD_BUG_ON(HOST_MEM_POLICY_BIND != MPOL_BIND);
15
@@ -XXX,XX +XXX,XX @@ QEMU_BUILD_BUG_ON(HOST_MEM_POLICY_BIND != MPOL_BIND);
16
QEMU_BUILD_BUG_ON(HOST_MEM_POLICY_INTERLEAVE != MPOL_INTERLEAVE);
16
QEMU_BUILD_BUG_ON(HOST_MEM_POLICY_INTERLEAVE != MPOL_INTERLEAVE);
17
#endif
17
#endif
18
18
19
+static void host_memory_backend_apply_settings(HostMemoryBackend *backend,
19
+static void host_memory_backend_apply_settings(HostMemoryBackend *backend,
20
+ void *ptr, uint64_t size,
20
+ void *ptr, uint64_t size,
21
+ Error **errp)
21
+ Error **errp)
22
+{
22
+{
23
+ bool async = !phase_check(PHASE_LATE_BACKENDS_CREATED);
23
+ bool async = !phase_check(PHASE_LATE_BACKENDS_CREATED);
24
+
24
+
25
+ if (backend->merge) {
25
+ if (backend->merge) {
26
+ qemu_madvise(ptr, size, QEMU_MADV_MERGEABLE);
26
+ qemu_madvise(ptr, size, QEMU_MADV_MERGEABLE);
27
+ }
27
+ }
28
+ if (!backend->dump) {
28
+ if (!backend->dump) {
29
+ qemu_madvise(ptr, size, QEMU_MADV_DONTDUMP);
29
+ qemu_madvise(ptr, size, QEMU_MADV_DONTDUMP);
30
+ }
30
+ }
31
+#ifdef CONFIG_NUMA
31
+#ifdef CONFIG_NUMA
32
+ unsigned long lastbit = find_last_bit(backend->host_nodes, MAX_NODES);
32
+ unsigned long lastbit = find_last_bit(backend->host_nodes, MAX_NODES);
33
+ /* lastbit == MAX_NODES means maxnode = 0 */
33
+ /* lastbit == MAX_NODES means maxnode = 0 */
34
+ unsigned long maxnode = (lastbit + 1) % (MAX_NODES + 1);
34
+ unsigned long maxnode = (lastbit + 1) % (MAX_NODES + 1);
35
+ /*
35
+ /*
36
+ * Ensure policy won't be ignored in case memory is preallocated
36
+ * Ensure policy won't be ignored in case memory is preallocated
37
+ * before mbind(). note: MPOL_MF_STRICT is ignored on hugepages so
37
+ * before mbind(). note: MPOL_MF_STRICT is ignored on hugepages so
38
+ * this doesn't catch hugepage case.
38
+ * this doesn't catch hugepage case.
39
+ */
39
+ */
40
+ unsigned flags = MPOL_MF_STRICT | MPOL_MF_MOVE;
40
+ unsigned flags = MPOL_MF_STRICT | MPOL_MF_MOVE;
41
+ int mode = backend->policy;
41
+ int mode = backend->policy;
42
+
42
+
43
+ /*
43
+ /*
44
+ * Check for invalid host-nodes and policies and give more verbose
44
+ * Check for invalid host-nodes and policies and give more verbose
45
+ * error messages than mbind().
45
+ * error messages than mbind().
46
+ */
46
+ */
47
+ if (maxnode && backend->policy == MPOL_DEFAULT) {
47
+ if (maxnode && backend->policy == MPOL_DEFAULT) {
48
+ error_setg(errp, "host-nodes must be empty for policy default,"
48
+ error_setg(errp, "host-nodes must be empty for policy default,"
49
+ " or you should explicitly specify a policy other"
49
+ " or you should explicitly specify a policy other"
50
+ " than default");
50
+ " than default");
51
+ return;
51
+ return;
52
+ } else if (maxnode == 0 && backend->policy != MPOL_DEFAULT) {
52
+ } else if (maxnode == 0 && backend->policy != MPOL_DEFAULT) {
53
+ error_setg(errp, "host-nodes must be set for policy %s",
53
+ error_setg(errp, "host-nodes must be set for policy %s",
54
+ HostMemPolicy_str(backend->policy));
54
+ HostMemPolicy_str(backend->policy));
55
+ return;
55
+ return;
56
+ }
56
+ }
57
+
57
+
58
+ /*
58
+ /*
59
+ * We can have up to MAX_NODES nodes, but we need to pass maxnode+1
59
+ * We can have up to MAX_NODES nodes, but we need to pass maxnode+1
60
+ * as argument to mbind() due to an old Linux bug (feature?) which
60
+ * as argument to mbind() due to an old Linux bug (feature?) which
61
+ * cuts off the last specified node. This means backend->host_nodes
61
+ * cuts off the last specified node. This means backend->host_nodes
62
+ * must have MAX_NODES+1 bits available.
62
+ * must have MAX_NODES+1 bits available.
63
+ */
63
+ */
64
+ assert(sizeof(backend->host_nodes) >=
64
+ assert(sizeof(backend->host_nodes) >=
65
+ BITS_TO_LONGS(MAX_NODES + 1) * sizeof(unsigned long));
65
+ BITS_TO_LONGS(MAX_NODES + 1) * sizeof(unsigned long));
66
+ assert(maxnode <= MAX_NODES);
66
+ assert(maxnode <= MAX_NODES);
67
+
67
+
68
+#ifdef HAVE_NUMA_HAS_PREFERRED_MANY
68
+#ifdef HAVE_NUMA_HAS_PREFERRED_MANY
69
+ if (mode == MPOL_PREFERRED && numa_has_preferred_many() > 0) {
69
+ if (mode == MPOL_PREFERRED && numa_has_preferred_many() > 0) {
70
+ /*
70
+ /*
71
+ * Replace with MPOL_PREFERRED_MANY otherwise the mbind() below
71
+ * Replace with MPOL_PREFERRED_MANY otherwise the mbind() below
72
+ * silently picks the first node.
72
+ * silently picks the first node.
73
+ */
73
+ */
74
+ mode = MPOL_PREFERRED_MANY;
74
+ mode = MPOL_PREFERRED_MANY;
75
+ }
75
+ }
76
+#endif
76
+#endif
77
+
77
+
78
+ if (maxnode &&
78
+ if (maxnode &&
79
+ mbind(ptr, size, mode, backend->host_nodes, maxnode + 1, flags)) {
79
+ mbind(ptr, size, mode, backend->host_nodes, maxnode + 1, flags)) {
80
+ if (backend->policy != MPOL_DEFAULT || errno != ENOSYS) {
80
+ if (backend->policy != MPOL_DEFAULT || errno != ENOSYS) {
81
+ error_setg_errno(errp, errno,
81
+ error_setg_errno(errp, errno,
82
+ "cannot bind memory to host NUMA nodes");
82
+ "cannot bind memory to host NUMA nodes");
83
+ return;
83
+ return;
84
+ }
84
+ }
85
+ }
85
+ }
86
+#endif
86
+#endif
87
+ /*
87
+ /*
88
+ * Preallocate memory after the NUMA policy has been instantiated.
88
+ * Preallocate memory after the NUMA policy has been instantiated.
89
+ * This is necessary to guarantee memory is allocated with
89
+ * This is necessary to guarantee memory is allocated with
90
+ * specified NUMA policy in place.
90
+ * specified NUMA policy in place.
91
+ */
91
+ */
92
+ if (backend->prealloc &&
92
+ if (backend->prealloc &&
93
+ !qemu_prealloc_mem(memory_region_get_fd(&backend->mr),
93
+ !qemu_prealloc_mem(memory_region_get_fd(&backend->mr),
94
+ ptr, size, backend->prealloc_threads,
94
+ ptr, size, backend->prealloc_threads,
95
+ backend->prealloc_context, async, errp)) {
95
+ backend->prealloc_context, async, errp)) {
96
+ return;
96
+ return;
97
+ }
97
+ }
98
+}
98
+}
99
+
99
+
100
char *
100
char *
101
host_memory_backend_get_name(HostMemoryBackend *backend)
101
host_memory_backend_get_name(HostMemoryBackend *backend)
102
{
102
{
103
@@ -XXX,XX +XXX,XX @@ host_memory_backend_memory_complete(UserCreatable *uc, Error **errp)
103
@@ -XXX,XX +XXX,XX @@ host_memory_backend_memory_complete(UserCreatable *uc, Error **errp)
104
void *ptr;
104
void *ptr;
105
uint64_t sz;
105
uint64_t sz;
106
size_t pagesize;
106
size_t pagesize;
107
- bool async = !phase_check(PHASE_LATE_BACKENDS_CREATED);
107
- bool async = !phase_check(PHASE_LATE_BACKENDS_CREATED);
108
108
109
if (!bc->alloc) {
109
if (!bc->alloc) {
110
return;
110
return;
111
@@ -XXX,XX +XXX,XX @@ host_memory_backend_memory_complete(UserCreatable *uc, Error **errp)
111
@@ -XXX,XX +XXX,XX @@ host_memory_backend_memory_complete(UserCreatable *uc, Error **errp)
112
return;
112
return;
113
}
113
}
114
114
115
- if (backend->merge) {
115
- if (backend->merge) {
116
- qemu_madvise(ptr, sz, QEMU_MADV_MERGEABLE);
116
- qemu_madvise(ptr, sz, QEMU_MADV_MERGEABLE);
117
- }
117
- }
118
- if (!backend->dump) {
118
- if (!backend->dump) {
119
- qemu_madvise(ptr, sz, QEMU_MADV_DONTDUMP);
119
- qemu_madvise(ptr, sz, QEMU_MADV_DONTDUMP);
120
- }
120
- }
121
-#ifdef CONFIG_NUMA
121
-#ifdef CONFIG_NUMA
122
- unsigned long lastbit = find_last_bit(backend->host_nodes, MAX_NODES);
122
- unsigned long lastbit = find_last_bit(backend->host_nodes, MAX_NODES);
123
- /* lastbit == MAX_NODES means maxnode = 0 */
123
- /* lastbit == MAX_NODES means maxnode = 0 */
124
- unsigned long maxnode = (lastbit + 1) % (MAX_NODES + 1);
124
- unsigned long maxnode = (lastbit + 1) % (MAX_NODES + 1);
125
- /*
125
- /*
126
- * Ensure policy won't be ignored in case memory is preallocated
126
- * Ensure policy won't be ignored in case memory is preallocated
127
- * before mbind(). note: MPOL_MF_STRICT is ignored on hugepages so
127
- * before mbind(). note: MPOL_MF_STRICT is ignored on hugepages so
128
- * this doesn't catch hugepage case.
128
- * this doesn't catch hugepage case.
129
- */
129
- */
130
- unsigned flags = MPOL_MF_STRICT | MPOL_MF_MOVE;
130
- unsigned flags = MPOL_MF_STRICT | MPOL_MF_MOVE;
131
- int mode = backend->policy;
131
- int mode = backend->policy;
132
-
132
-
133
- /* check for invalid host-nodes and policies and give more verbose
133
- /* check for invalid host-nodes and policies and give more verbose
134
- * error messages than mbind(). */
134
- * error messages than mbind(). */
135
- if (maxnode && backend->policy == MPOL_DEFAULT) {
135
- if (maxnode && backend->policy == MPOL_DEFAULT) {
136
- error_setg(errp, "host-nodes must be empty for policy default,"
136
- error_setg(errp, "host-nodes must be empty for policy default,"
137
- " or you should explicitly specify a policy other"
137
- " or you should explicitly specify a policy other"
138
- " than default");
138
- " than default");
139
- return;
139
- return;
140
- } else if (maxnode == 0 && backend->policy != MPOL_DEFAULT) {
140
- } else if (maxnode == 0 && backend->policy != MPOL_DEFAULT) {
141
- error_setg(errp, "host-nodes must be set for policy %s",
141
- error_setg(errp, "host-nodes must be set for policy %s",
142
- HostMemPolicy_str(backend->policy));
142
- HostMemPolicy_str(backend->policy));
143
- return;
143
- return;
144
- }
144
- }
145
-
145
-
146
- /*
146
- /*
147
- * We can have up to MAX_NODES nodes, but we need to pass maxnode+1
147
- * We can have up to MAX_NODES nodes, but we need to pass maxnode+1
148
- * as argument to mbind() due to an old Linux bug (feature?) which
148
- * as argument to mbind() due to an old Linux bug (feature?) which
149
- * cuts off the last specified node. This means backend->host_nodes
149
- * cuts off the last specified node. This means backend->host_nodes
150
- * must have MAX_NODES+1 bits available.
150
- * must have MAX_NODES+1 bits available.
151
- */
151
- */
152
- assert(sizeof(backend->host_nodes) >=
152
- assert(sizeof(backend->host_nodes) >=
153
- BITS_TO_LONGS(MAX_NODES + 1) * sizeof(unsigned long));
153
- BITS_TO_LONGS(MAX_NODES + 1) * sizeof(unsigned long));
154
- assert(maxnode <= MAX_NODES);
154
- assert(maxnode <= MAX_NODES);
155
-
155
-
156
-#ifdef HAVE_NUMA_HAS_PREFERRED_MANY
156
-#ifdef HAVE_NUMA_HAS_PREFERRED_MANY
157
- if (mode == MPOL_PREFERRED && numa_has_preferred_many() > 0) {
157
- if (mode == MPOL_PREFERRED && numa_has_preferred_many() > 0) {
158
- /*
158
- /*
159
- * Replace with MPOL_PREFERRED_MANY otherwise the mbind() below
159
- * Replace with MPOL_PREFERRED_MANY otherwise the mbind() below
160
- * silently picks the first node.
160
- * silently picks the first node.
161
- */
161
- */
162
- mode = MPOL_PREFERRED_MANY;
162
- mode = MPOL_PREFERRED_MANY;
163
- }
163
- }
164
-#endif
164
-#endif
165
-
165
-
166
- if (maxnode &&
166
- if (maxnode &&
167
- mbind(ptr, sz, mode, backend->host_nodes, maxnode + 1, flags)) {
167
- mbind(ptr, sz, mode, backend->host_nodes, maxnode + 1, flags)) {
168
- if (backend->policy != MPOL_DEFAULT || errno != ENOSYS) {
168
- if (backend->policy != MPOL_DEFAULT || errno != ENOSYS) {
169
- error_setg_errno(errp, errno,
169
- error_setg_errno(errp, errno,
170
- "cannot bind memory to host NUMA nodes");
170
- "cannot bind memory to host NUMA nodes");
171
- return;
171
- return;
172
- }
172
- }
173
- }
173
- }
174
-#endif
174
-#endif
175
- /*
175
- /*
176
- * Preallocate memory after the NUMA policy has been instantiated.
176
- * Preallocate memory after the NUMA policy has been instantiated.
177
- * This is necessary to guarantee memory is allocated with
177
- * This is necessary to guarantee memory is allocated with
178
- * specified NUMA policy in place.
178
- * specified NUMA policy in place.
179
- */
179
- */
180
- if (backend->prealloc && !qemu_prealloc_mem(memory_region_get_fd(&backend->mr),
180
- if (backend->prealloc && !qemu_prealloc_mem(memory_region_get_fd(&backend->mr),
181
- ptr, sz,
181
- ptr, sz,
182
- backend->prealloc_threads,
182
- backend->prealloc_threads,
183
- backend->prealloc_context,
183
- backend->prealloc_context,
184
- async, errp)) {
184
- async, errp)) {
185
- return;
185
- return;
186
- }
186
- }
187
+ host_memory_backend_apply_settings(backend, ptr, sz, errp);
187
+ host_memory_backend_apply_settings(backend, ptr, sz, errp);
188
}
188
}
189
189
190
static bool
190
static bool
191
--
191
--
192
2.43.5
192
2.43.5
diff view generated by jsdifflib
1
From: William Roche <william.roche@oracle.com>
1
From: William Roche <william.roche@oracle.com>
2
2
3
Let's register a RAM block notifier and react on remap notifications.
3
Let's register a RAM block notifier and react on remap notifications.
4
Simply re-apply the settings. Exit if something goes wrong.
4
Simply re-apply the settings. Exit if something goes wrong.
5
5
6
Merging and dump settings are handled by the remap notification
6
Merging and dump settings are handled by the remap notification
7
in addition to memory policy and preallocation.
7
in addition to memory policy and preallocation.
8
8
9
Co-developed-by: David Hildenbrand <david@redhat.com>
9
Co-developed-by: David Hildenbrand <david@redhat.com>
10
Signed-off-by: William Roche <william.roche@oracle.com>
10
Signed-off-by: William Roche <william.roche@oracle.com>
11
---
11
---
12
backends/hostmem.c | 34 ++++++++++++++++++++++++++++++++++
12
backends/hostmem.c | 34 ++++++++++++++++++++++++++++++++++
13
include/system/hostmem.h | 1 +
13
include/system/hostmem.h | 1 +
14
system/physmem.c | 4 ----
14
system/physmem.c | 4 ----
15
3 files changed, 35 insertions(+), 4 deletions(-)
15
3 files changed, 35 insertions(+), 4 deletions(-)
16
16
17
diff --git a/backends/hostmem.c b/backends/hostmem.c
17
diff --git a/backends/hostmem.c b/backends/hostmem.c
18
index XXXXXXX..XXXXXXX 100644
18
index XXXXXXX..XXXXXXX 100644
19
--- a/backends/hostmem.c
19
--- a/backends/hostmem.c
20
+++ b/backends/hostmem.c
20
+++ b/backends/hostmem.c
21
@@ -XXX,XX +XXX,XX @@ static void host_memory_backend_set_prealloc_threads(Object *obj, Visitor *v,
21
@@ -XXX,XX +XXX,XX @@ static void host_memory_backend_set_prealloc_threads(Object *obj, Visitor *v,
22
backend->prealloc_threads = value;
22
backend->prealloc_threads = value;
23
}
23
}
24
24
25
+static void host_memory_backend_ram_remapped(RAMBlockNotifier *n, void *host,
25
+static void host_memory_backend_ram_remapped(RAMBlockNotifier *n, void *host,
26
+ size_t offset, size_t size)
26
+ size_t offset, size_t size)
27
+{
27
+{
28
+ HostMemoryBackend *backend = container_of(n, HostMemoryBackend,
28
+ HostMemoryBackend *backend = container_of(n, HostMemoryBackend,
29
+ ram_notifier);
29
+ ram_notifier);
30
+ Error *err = NULL;
30
+ Error *err = NULL;
31
+
31
+
32
+ if (!host_memory_backend_mr_inited(backend) ||
32
+ if (!host_memory_backend_mr_inited(backend) ||
33
+ memory_region_get_ram_ptr(&backend->mr) != host) {
33
+ memory_region_get_ram_ptr(&backend->mr) != host) {
34
+ return;
34
+ return;
35
+ }
35
+ }
36
+
36
+
37
+ host_memory_backend_apply_settings(backend, host + offset, size, &err);
37
+ host_memory_backend_apply_settings(backend, host + offset, size, &err);
38
+ if (err) {
38
+ if (err) {
39
+ /*
39
+ /*
40
+ * If memory settings can't be successfully applied on remap,
40
+ * If memory settings can't be successfully applied on remap,
41
+ * don't take the risk to continue without them.
41
+ * don't take the risk to continue without them.
42
+ */
42
+ */
43
+ error_report_err(err);
43
+ error_report_err(err);
44
+ exit(1);
44
+ exit(1);
45
+ }
45
+ }
46
+}
46
+}
47
+
47
+
48
static void host_memory_backend_init(Object *obj)
48
static void host_memory_backend_init(Object *obj)
49
{
49
{
50
HostMemoryBackend *backend = MEMORY_BACKEND(obj);
50
HostMemoryBackend *backend = MEMORY_BACKEND(obj);
51
MachineState *machine = MACHINE(qdev_get_machine());
51
MachineState *machine = MACHINE(qdev_get_machine());
52
52
53
+ backend->ram_notifier.ram_block_remapped = host_memory_backend_ram_remapped;
53
+ backend->ram_notifier.ram_block_remapped = host_memory_backend_ram_remapped;
54
+ ram_block_notifier_add(&backend->ram_notifier);
54
+ ram_block_notifier_add(&backend->ram_notifier);
55
+
55
+
56
/* TODO: convert access to globals to compat properties */
56
/* TODO: convert access to globals to compat properties */
57
backend->merge = machine_mem_merge(machine);
57
backend->merge = machine_mem_merge(machine);
58
backend->dump = machine_dump_guest_core(machine);
58
backend->dump = machine_dump_guest_core(machine);
59
@@ -XXX,XX +XXX,XX @@ static void host_memory_backend_post_init(Object *obj)
59
@@ -XXX,XX +XXX,XX @@ static void host_memory_backend_post_init(Object *obj)
60
object_apply_compat_props(obj);
60
object_apply_compat_props(obj);
61
}
61
}
62
62
63
+static void host_memory_backend_finalize(Object *obj)
63
+static void host_memory_backend_finalize(Object *obj)
64
+{
64
+{
65
+ HostMemoryBackend *backend = MEMORY_BACKEND(obj);
65
+ HostMemoryBackend *backend = MEMORY_BACKEND(obj);
66
+
66
+
67
+ ram_block_notifier_remove(&backend->ram_notifier);
67
+ ram_block_notifier_remove(&backend->ram_notifier);
68
+}
68
+}
69
+
69
+
70
bool host_memory_backend_mr_inited(HostMemoryBackend *backend)
70
bool host_memory_backend_mr_inited(HostMemoryBackend *backend)
71
{
71
{
72
/*
72
/*
73
@@ -XXX,XX +XXX,XX @@ static const TypeInfo host_memory_backend_info = {
73
@@ -XXX,XX +XXX,XX @@ static const TypeInfo host_memory_backend_info = {
74
.instance_size = sizeof(HostMemoryBackend),
74
.instance_size = sizeof(HostMemoryBackend),
75
.instance_init = host_memory_backend_init,
75
.instance_init = host_memory_backend_init,
76
.instance_post_init = host_memory_backend_post_init,
76
.instance_post_init = host_memory_backend_post_init,
77
+ .instance_finalize = host_memory_backend_finalize,
77
+ .instance_finalize = host_memory_backend_finalize,
78
.interfaces = (InterfaceInfo[]) {
78
.interfaces = (InterfaceInfo[]) {
79
{ TYPE_USER_CREATABLE },
79
{ TYPE_USER_CREATABLE },
80
{ }
80
{ }
81
diff --git a/include/system/hostmem.h b/include/system/hostmem.h
81
diff --git a/include/system/hostmem.h b/include/system/hostmem.h
82
index XXXXXXX..XXXXXXX 100644
82
index XXXXXXX..XXXXXXX 100644
83
--- a/include/system/hostmem.h
83
--- a/include/system/hostmem.h
84
+++ b/include/system/hostmem.h
84
+++ b/include/system/hostmem.h
85
@@ -XXX,XX +XXX,XX @@ struct HostMemoryBackend {
85
@@ -XXX,XX +XXX,XX @@ struct HostMemoryBackend {
86
HostMemPolicy policy;
86
HostMemPolicy policy;
87
87
88
MemoryRegion mr;
88
MemoryRegion mr;
89
+ RAMBlockNotifier ram_notifier;
89
+ RAMBlockNotifier ram_notifier;
90
};
90
};
91
91
92
bool host_memory_backend_mr_inited(HostMemoryBackend *backend);
92
bool host_memory_backend_mr_inited(HostMemoryBackend *backend);
93
diff --git a/system/physmem.c b/system/physmem.c
93
diff --git a/system/physmem.c b/system/physmem.c
94
index XXXXXXX..XXXXXXX 100644
94
index XXXXXXX..XXXXXXX 100644
95
--- a/system/physmem.c
95
--- a/system/physmem.c
96
+++ b/system/physmem.c
96
+++ b/system/physmem.c
97
@@ -XXX,XX +XXX,XX @@ void qemu_ram_remap(ram_addr_t addr)
97
@@ -XXX,XX +XXX,XX @@ void qemu_ram_remap(ram_addr_t addr)
98
{
98
{
99
RAMBlock *block;
99
RAMBlock *block;
100
uint64_t offset;
100
uint64_t offset;
101
- void *vaddr;
101
- void *vaddr;
102
size_t page_size;
102
size_t page_size;
103
103
104
RAMBLOCK_FOREACH(block) {
104
RAMBLOCK_FOREACH(block) {
105
@@ -XXX,XX +XXX,XX @@ void qemu_ram_remap(ram_addr_t addr)
105
@@ -XXX,XX +XXX,XX @@ void qemu_ram_remap(ram_addr_t addr)
106
page_size = qemu_ram_pagesize(block);
106
page_size = qemu_ram_pagesize(block);
107
offset = QEMU_ALIGN_DOWN(offset, page_size);
107
offset = QEMU_ALIGN_DOWN(offset, page_size);
108
108
109
- vaddr = ramblock_ptr(block, offset);
109
- vaddr = ramblock_ptr(block, offset);
110
if (block->flags & RAM_PREALLOC) {
110
if (block->flags & RAM_PREALLOC) {
111
;
111
;
112
} else if (xen_enabled()) {
112
} else if (xen_enabled()) {
113
@@ -XXX,XX +XXX,XX @@ void qemu_ram_remap(ram_addr_t addr)
113
@@ -XXX,XX +XXX,XX @@ void qemu_ram_remap(ram_addr_t addr)
114
exit(1);
114
exit(1);
115
}
115
}
116
}
116
}
117
- memory_try_enable_merging(vaddr, page_size);
117
- memory_try_enable_merging(vaddr, page_size);
118
- qemu_ram_setup_dump(vaddr, page_size);
118
- qemu_ram_setup_dump(vaddr, page_size);
119
ram_block_notify_remap(block->host, offset, page_size);
119
ram_block_notify_remap(block->host, offset, page_size);
120
}
120
}
121
121
122
--
122
--
123
2.43.5
123
2.43.5
diff view generated by jsdifflib