The virtio-mem is paravirtualized mechanism of adding/removing
memory to/from a VM. A virtio-mem-pci device is split into blocks
of equal size which are then exposed (all or only a requested
portion of them) to the guest kernel to use as regular memory.
Therefore, the device has two important attributes:
1) block-size, which defines the size of a block
2) requested-size, which defines how much memory (in bytes)
is the device requested to expose to the guest.
The 'block-size' is configured on command line and immutable
throughout device's lifetime. The 'requested-size' can be set on
the command line too, but also is adjustable via monitor. In
fact, that is how management software places its requests to
change the memory allocation. If it wants to give more memory to
the guest it changes 'requested-size' to a bigger value, and if it
wants to shrink guest memory it changes the 'requested-size' to a
smaller value. Note, value of zero means that guest should
release all memory offered by the device. Of course, guest has to
cooperate. Therefore, there is a third attribute 'size' which is
read only and reflects how much memory the guest still has. This
can be different to 'requested-size', obviously. Because of name
clash, I've named it 'actualsize' and it is dealt with in future
commits (it is a runtime information anyway).
In the backend, memory for virtio-mem is backed by usual objects:
memory-backend-{ram,file,memfd} and their size puts the cap on
the amount of memory that a virtio-mem device can offer to a
guest. But we are already able to express this info using <size/>
under <target/>.
Therefore, we need only two more elements to cover 'block-size'
and 'requested-size' attributes. This is the XML I've came up
with:
<memory model='virtio-mem'>
<source>
<nodemask>1-3</nodemask>
<pagesize unit='KiB'>2048</pagesize>
</source>
<target>
<size unit='KiB'>2097152</size>
<node>0</node>
<block unit='KiB'>2048</block>
<requested unit='KiB'>1048576</requested>
</target>
<address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
</memory>
I hope by now it is obvious that:
1) 'requested-size' must be an integer multiple of
'block-size', and
2) virtio-mem-pci device goes onto PCI bus and thus needs PCI
address.
Then there is a limitation that the minimal 'block-size' is
transparent huge page size (I'll leave this without explanation).
Since now we have (possibly) two or more devices that allow
memory inflation/deflation and accounting for all of them (and
thus keeping <currentMemory/> updated) might be hard. Therefore,
I'm deliberately forbidding memballoon. It's okay - virtio-mem is
superior to memballoon anyway. We can always reevaluate later.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
---
docs/formatdomain.rst | 51 +++++++++++++--
docs/schemas/domaincommon.rng | 11 ++++
src/conf/domain_conf.c | 53 ++++++++++++++-
src/conf/domain_conf.h | 3 +
src/conf/domain_validate.c | 39 +++++++++++
src/qemu/qemu_alias.c | 1 +
src/qemu/qemu_command.c | 1 +
src/qemu/qemu_domain.c | 27 +++++++-
src/qemu/qemu_domain_address.c | 38 ++++++++---
src/qemu/qemu_process.c | 2 +
src/qemu/qemu_validate.c | 22 ++++++-
src/security/security_apparmor.c | 1 +
src/security/security_dac.c | 2 +
src/security/security_selinux.c | 2 +
.../memory-hotplug-virtio-mem.xml | 64 +++++++++++++++++++
...emory-hotplug-virtio-mem.x86_64-latest.xml | 1 +
tests/qemuxml2xmltest.c | 1 +
17 files changed, 298 insertions(+), 21 deletions(-)
create mode 100644 tests/qemuxml2argvdata/memory-hotplug-virtio-mem.xml
create mode 120000 tests/qemuxml2xmloutdata/memory-hotplug-virtio-mem.x86_64-latest.xml
diff --git a/docs/formatdomain.rst b/docs/formatdomain.rst
index 2587106191..05d359a32d 100644
--- a/docs/formatdomain.rst
+++ b/docs/formatdomain.rst
@@ -6924,6 +6924,8 @@ appropriate, so there is no need to explicitly add this element in the guest XML
unless a specific PCI slot needs to be assigned. :since:`Since 0.8.3, Xen, QEMU
and KVM only` Additionally, :since:`since 0.8.4` , if the memballoon device
needs to be explicitly disabled, ``model='none'`` may be used.
+Using memballoon among with ``virtio-mem`` memory device is unsupported. In
+that case, setting model to anything else then ``none`` results in an error.
Example: automatically added device with KVM
@@ -6935,6 +6937,16 @@ Example: automatically added device with KVM
</devices>
...
+Example: automatically added device with QEMU/KVM and a ``virtio-mem`` device:
+
+::
+
+ ...
+ <devices>
+ <memballoon model='none'/>
+ </devices>
+ ...
+
Example: manually added device with static PCI slot 2 requested
::
@@ -7374,6 +7386,18 @@ Example: usage of the memory devices
<size unit='KiB'>524288</size>
</target>
</memory>
+ <memory model='virtio-mem'>
+ <source>
+ <nodemask>1-3</nodemask>
+ <pagesize unit='KiB'>2048</pagesize>
+ </source>
+ <target>
+ <size unit='KiB'>2097152</size>
+ <node>0</node>
+ <block unit='KiB'>2048</block>
+ <requested unit='KiB'>1048576</requested>
+ </target>
+ </memory>
</devices>
...
@@ -7381,7 +7405,9 @@ Example: usage of the memory devices
Provide ``dimm`` to add a virtual DIMM module to the guest. :since:`Since
1.2.14` Provide ``nvdimm`` model that adds a Non-Volatile DIMM module.
:since:`Since 3.2.0` Provide ``virtio-pmem`` model to add a paravirtualized
- persistent memory device. :since:`Since 7.1.0`
+ persistent memory device. :since:`Since 7.1.0` Provide ``virtio-mem`` model
+ to add paravirtualized memory device. :since:`Since 7.1.0` Please note that
+ using ``virtio-mem`` with memory balloon is unsupported.
``access``
An optional attribute ``access`` ( :since:`since 3.2.0` ) that provides
@@ -7404,10 +7430,11 @@ Example: usage of the memory devices
allowed only for ``model='nvdimm'`` for pSeries guests. :since:`Since 6.2.0`
``source``
- For model ``dimm`` this element is optional and allows to fine tune the
- source of the memory used for the given memory device. If the element is not
- provided defaults configured via ``numatune`` are used. If ``dimm`` is
- provided, then the following optional elements can be provided as well:
+ For model ``dimm`` and model ``virtio-mem`` this element is optional and
+ allows to fine tune the source of the memory used for the given memory
+ device. If the element is not provided defaults configured via ``numatune``
+ are used. If the element is provided, then the following optional elements
+ can be provided:
``pagesize``
This element can be used to override the default host page size used for
@@ -7446,7 +7473,8 @@ Example: usage of the memory devices
added memory from the perspective of the guest.
The mandatory ``size`` subelement configures the size of the added memory as
- a scaled integer.
+ a scaled integer. For ``virtio-mem`` this represents the maximum possible
+ size exposed to the guest.
The ``node`` subelement configures the guest NUMA node to attach the memory
to. The element shall be used only if the guest has NUMA nodes configured.
@@ -7473,6 +7501,17 @@ Example: usage of the memory devices
so other backend types should use the ``readonly`` element. :since:`Since
5.0.0`
+ ``block``
+ For ``virtio-mem`` only.
+ The size of an individual block, granularity of division of memory module.
+ Must be power of two and at least equal to size of a transparent hugepage
+ (2MiB on x84_64). The default is hypervisor dependent.
+
+ ``requested``
+ For ``virtio-mem`` only.
+ The total size exposed to the guest. Must respect ``block`` granularity
+ and be smaller or equal to ``size``.
+
:anchor:`<a id="elementsIommu"/>`
IOMMU devices
diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng
index e6de934456..1f672756bd 100644
--- a/docs/schemas/domaincommon.rng
+++ b/docs/schemas/domaincommon.rng
@@ -6056,6 +6056,7 @@
<value>dimm</value>
<value>nvdimm</value>
<value>virtio-pmem</value>
+ <value>virtio-mem</value>
</choice>
</attribute>
<optional>
@@ -6140,6 +6141,16 @@
<ref name="unsignedInt"/>
</element>
</optional>
+ <optional>
+ <element name="block">
+ <ref name="scaledInteger"/>
+ </element>
+ </optional>
+ <optional>
+ <element name="requested">
+ <ref name="scaledInteger"/>
+ </element>
+ </optional>
<optional>
<element name="label">
<element name="size">
diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c
index b731744f04..f4c08ef43d 100644
--- a/src/conf/domain_conf.c
+++ b/src/conf/domain_conf.c
@@ -1329,6 +1329,7 @@ VIR_ENUM_IMPL(virDomainMemoryModel,
"dimm",
"nvdimm",
"virtio-pmem",
+ "virtio-mem",
);
VIR_ENUM_IMPL(virDomainShmemModel,
@@ -5369,6 +5370,7 @@ virDomainMemoryDefPostParse(virDomainMemoryDefPtr mem,
}
break;
+ case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM:
case VIR_DOMAIN_MEMORY_MODEL_DIMM:
case VIR_DOMAIN_MEMORY_MODEL_NONE:
case VIR_DOMAIN_MEMORY_MODEL_LAST:
@@ -15392,6 +15394,7 @@ virDomainMemorySourceDefParseXML(xmlNodePtr node,
switch (def->model) {
case VIR_DOMAIN_MEMORY_MODEL_DIMM:
+ case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM:
if (virDomainParseMemory("./pagesize", "./pagesize/@unit", ctxt,
&def->pagesize, false, false) < 0)
return -1;
@@ -15458,7 +15461,8 @@ virDomainMemoryTargetDefParseXML(xmlNodePtr node,
&def->size, true, false) < 0)
return -1;
- if (def->model == VIR_DOMAIN_MEMORY_MODEL_NVDIMM) {
+ switch (def->model) {
+ case VIR_DOMAIN_MEMORY_MODEL_NVDIMM:
if (virDomainParseMemory("./label/size", "./label/size/@unit", ctxt,
&def->labelsize, false, false) < 0)
return -1;
@@ -15477,6 +15481,23 @@ virDomainMemoryTargetDefParseXML(xmlNodePtr node,
if (virXPathBoolean("boolean(./readonly)", ctxt))
def->readonly = true;
+ break;
+
+ case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM:
+ if (virDomainParseMemory("./block", "./block/@unit", ctxt,
+ &def->blocksize, false, false) < 0)
+ return -1;
+
+ if (virDomainParseMemory("./requested", "./requested/@unit", ctxt,
+ &def->requestedsize, false, false) < 0)
+ return -1;
+ break;
+
+ case VIR_DOMAIN_MEMORY_MODEL_NONE:
+ case VIR_DOMAIN_MEMORY_MODEL_DIMM:
+ case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM:
+ case VIR_DOMAIN_MEMORY_MODEL_LAST:
+ break;
}
return 0;
@@ -17294,11 +17315,14 @@ virDomainMemoryFindByDefInternal(virDomainDefPtr def,
/* target info -> always present */
if (tmp->model != mem->model ||
tmp->targetNode != mem->targetNode ||
- tmp->size != mem->size)
+ tmp->size != mem->size ||
+ tmp->blocksize != mem->blocksize ||
+ tmp->requestedsize != mem->requestedsize)
continue;
switch (mem->model) {
case VIR_DOMAIN_MEMORY_MODEL_DIMM:
+ case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM:
/* source stuff -> match with device */
if (tmp->pagesize != mem->pagesize)
continue;
@@ -22873,6 +22897,22 @@ virDomainMemoryDefCheckABIStability(virDomainMemoryDefPtr src,
return false;
}
+ if (src->blocksize != dst->blocksize) {
+ virReportError(VIR_ERR_CONFIG_UNSUPPORTED,
+ _("Target memory device block size '%llu' doesn't match "
+ "source memory device block size '%llu'"),
+ dst->blocksize, src->blocksize);
+ return false;
+ }
+
+ if (src->requestedsize != dst->requestedsize) {
+ virReportError(VIR_ERR_CONFIG_UNSUPPORTED,
+ _("Target memory device requested size '%llu' doesn't match "
+ "source memory device requested size '%llu'"),
+ dst->requestedsize, src->requestedsize);
+ return false;
+ }
+
if (src->model == VIR_DOMAIN_MEMORY_MODEL_NVDIMM) {
if (src->labelsize != dst->labelsize) {
virReportError(VIR_ERR_CONFIG_UNSUPPORTED,
@@ -26629,6 +26669,7 @@ virDomainMemorySourceDefFormat(virBufferPtr buf,
switch (def->model) {
case VIR_DOMAIN_MEMORY_MODEL_DIMM:
+ case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM:
if (def->sourceNodes) {
if (!(bitmap = virBitmapFormat(def->sourceNodes)))
return -1;
@@ -26685,6 +26726,14 @@ virDomainMemoryTargetDefFormat(virBufferPtr buf,
if (def->readonly)
virBufferAddLit(&childBuf, "<readonly/>\n");
+ if (def->blocksize) {
+ virBufferAsprintf(&childBuf, "<block unit='KiB'>%llu</block>\n",
+ def->blocksize);
+
+ virBufferAsprintf(&childBuf, "<requested unit='KiB'>%llu</requested>\n",
+ def->requestedsize);
+ }
+
virXMLFormatElement(buf, "target", NULL, &childBuf);
}
diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h
index 930eed60de..14d238df4b 100644
--- a/src/conf/domain_conf.h
+++ b/src/conf/domain_conf.h
@@ -2312,6 +2312,7 @@ typedef enum {
VIR_DOMAIN_MEMORY_MODEL_DIMM, /* dimm hotpluggable memory device */
VIR_DOMAIN_MEMORY_MODEL_NVDIMM, /* nvdimm memory device */
VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM, /* virtio-pmem memory device */
+ VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM, /* virtio-mem memory device */
VIR_DOMAIN_MEMORY_MODEL_LAST
} virDomainMemoryModel;
@@ -2332,6 +2333,8 @@ struct _virDomainMemoryDef {
int targetNode;
unsigned long long size; /* kibibytes */
unsigned long long labelsize; /* kibibytes; valid only for NVDIMM */
+ unsigned long long blocksize; /* kibibytes; valid only for VIRTIO_MEM */
+ unsigned long long requestedsize; /* kibibytes; valid only for VIRTIO_MEM */
bool readonly; /* valid only for NVDIMM */
/* required for QEMU NVDIMM ppc64 support */
diff --git a/src/conf/domain_validate.c b/src/conf/domain_validate.c
index b47ecba86b..7ab91c8f2f 100644
--- a/src/conf/domain_validate.c
+++ b/src/conf/domain_validate.c
@@ -26,6 +26,7 @@
#include "virconftypes.h"
#include "virlog.h"
#include "virutil.h"
+#include "virhostmem.h"
#define VIR_FROM_THIS VIR_FROM_DOMAIN
@@ -1612,6 +1613,8 @@ static int
virDomainMemoryDefValidate(const virDomainMemoryDef *mem,
const virDomainDef *def)
{
+ unsigned long long thpSize;
+
switch (mem->model) {
case VIR_DOMAIN_MEMORY_MODEL_NVDIMM:
if (!mem->nvdimmPath) {
@@ -1665,6 +1668,42 @@ virDomainMemoryDefValidate(const virDomainMemoryDef *mem,
_("virtio-pmem does not support NUMA nodes"));
return -1;
}
+ break;
+
+ case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM:
+ if (mem->requestedsize > mem->size) {
+ virReportError(VIR_ERR_XML_DETAIL, "%s",
+ _("requested size must be smaller than @size"));
+ return -1;
+ }
+
+ if (!VIR_IS_POW2(mem->blocksize)) {
+ virReportError(VIR_ERR_XML_DETAIL, "%s",
+ _("block size must be a power of two"));
+ return -1;
+ }
+
+ if (virHostMemGetTHPSize(&thpSize) < 0) {
+ /* We failed to get THP size, fall back to a sane default. On
+ * almost every architecture the size will be 2MiB, except for some
+ * funky arches like sparc and m68k. Use 2MiB and refine later if
+ * somebody complains. */
+ thpSize = 2048;
+ }
+
+ if (mem->blocksize < thpSize) {
+ virReportError(VIR_ERR_XML_DETAIL,
+ _("block size too small, must be at least %lluKiB"),
+ thpSize);
+ return -1;
+ }
+
+ if (mem->requestedsize % mem->blocksize != 0) {
+ virReportError(VIR_ERR_XML_DETAIL, "%s",
+ _("requested size must be an integer multiple of block size"));
+ return -1;
+ }
+ break;
case VIR_DOMAIN_MEMORY_MODEL_DIMM:
break;
diff --git a/src/qemu/qemu_alias.c b/src/qemu/qemu_alias.c
index 1c6f04c0ba..1a330829a3 100644
--- a/src/qemu/qemu_alias.c
+++ b/src/qemu/qemu_alias.c
@@ -522,6 +522,7 @@ qemuAssignDeviceMemoryAlias(virDomainDefPtr def,
case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM:
prefix = "virtiopmem";
break;
+ case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM:
case VIR_DOMAIN_MEMORY_MODEL_NONE:
case VIR_DOMAIN_MEMORY_MODEL_LAST:
default:
diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c
index d801018aa2..ac9a8c9f6a 100644
--- a/src/qemu/qemu_command.c
+++ b/src/qemu/qemu_command.c
@@ -3355,6 +3355,7 @@ qemuBuildMemoryDeviceStr(const virDomainDef *def,
device = "virtio-pmem-pci";
break;
+ case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM:
case VIR_DOMAIN_MEMORY_MODEL_NONE:
case VIR_DOMAIN_MEMORY_MODEL_LAST:
default:
diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c
index 541d592bbe..26b9b5583e 100644
--- a/src/qemu/qemu_domain.c
+++ b/src/qemu/qemu_domain.c
@@ -3679,10 +3679,21 @@ qemuDomainDefAddDefaultDevices(virDomainDefPtr def,
}
if (addDefaultMemballoon && !def->memballoon) {
- virDomainMemballoonDefPtr memballoon;
- memballoon = g_new0(virDomainMemballoonDef, 1);
+ virDomainMemballoonDefPtr memballoon = g_new0(virDomainMemballoonDef, 1);
+ size_t i;
- memballoon->model = VIR_DOMAIN_MEMBALLOON_MODEL_VIRTIO;
+ /* To simplify virtio-mem implementation, memballoon has to be turned
+ * off if domain has a virtio-mem device. See
+ * qemuValidateDomainDeviceDefMemory() for more details. */
+ for (i = 0; i < def->nmems; i++) {
+ if (def->mems[i]->model == VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM)
+ break;
+ }
+
+ if (i == def->nmems)
+ memballoon->model = VIR_DOMAIN_MEMBALLOON_MODEL_VIRTIO;
+ else
+ memballoon->model = VIR_DOMAIN_MEMBALLOON_MODEL_NONE;
def->memballoon = memballoon;
}
@@ -8775,6 +8786,16 @@ qemuDomainDefValidateMemoryHotplugDevice(const virDomainMemoryDef *mem,
needsNuma = false;
break;
+ case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM:
+ if (mem->info.type != VIR_DOMAIN_DEVICE_ADDRESS_TYPE_PCI &&
+ mem->info.type != VIR_DOMAIN_DEVICE_ADDRESS_TYPE_NONE) {
+ virReportError(VIR_ERR_CONFIG_UNSUPPORTED,
+ _("only 'pci' addresses are supported for the %s device"),
+ virDomainMemoryModelTypeToString(mem->model));
+ return -1;
+ }
+ break;
+
case VIR_DOMAIN_MEMORY_MODEL_NONE:
case VIR_DOMAIN_MEMORY_MODEL_LAST:
return -1;
diff --git a/src/qemu/qemu_domain_address.c b/src/qemu/qemu_domain_address.c
index 68dbf9e95b..64c57f1cad 100644
--- a/src/qemu/qemu_domain_address.c
+++ b/src/qemu/qemu_domain_address.c
@@ -380,9 +380,18 @@ qemuDomainPrimeVirtioDeviceAddresses(virDomainDefPtr def,
}
for (i = 0; i < def->nmems; i++) {
- if (def->mems[i]->model == VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM &&
- def->mems[i]->info.type == VIR_DOMAIN_DEVICE_ADDRESS_TYPE_NONE)
- def->mems[i]->info.type = type;
+ switch (def->mems[i]->model) {
+ case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM:
+ case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM:
+ if (def->mems[i]->info.type == VIR_DOMAIN_DEVICE_ADDRESS_TYPE_NONE)
+ def->mems[i]->info.type = type;
+ break;
+ case VIR_DOMAIN_MEMORY_MODEL_NONE:
+ case VIR_DOMAIN_MEMORY_MODEL_DIMM:
+ case VIR_DOMAIN_MEMORY_MODEL_NVDIMM:
+ case VIR_DOMAIN_MEMORY_MODEL_LAST:
+ break;
+ }
}
if (type == VIR_DOMAIN_DEVICE_ADDRESS_TYPE_CCW) {
@@ -1011,6 +1020,7 @@ qemuDomainDeviceCalculatePCIConnectFlags(virDomainDeviceDefPtr dev,
case VIR_DOMAIN_DEVICE_MEMORY:
switch (dev->data.memory->model) {
case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM:
+ case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM:
return virtioFlags;
case VIR_DOMAIN_MEMORY_MODEL_NONE:
@@ -2438,12 +2448,19 @@ qemuDomainAssignDevicePCISlots(virDomainDefPtr def,
for (i = 0; i < def->nmems; i++) {
virDomainMemoryDefPtr mem = def->mems[i];
- if (mem->model != VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM ||
- !virDeviceInfoPCIAddressIsWanted(&mem->info))
- continue;
-
- if (qemuDomainPCIAddressReserveNextAddr(addrs, &mem->info) < 0)
- return -1;
+ switch (mem->model) {
+ case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM:
+ case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM:
+ if (virDeviceInfoPCIAddressIsWanted(&mem->info) &&
+ qemuDomainPCIAddressReserveNextAddr(addrs, &mem->info) < 0)
+ return -1;
+ break;
+ case VIR_DOMAIN_MEMORY_MODEL_NONE:
+ case VIR_DOMAIN_MEMORY_MODEL_DIMM:
+ case VIR_DOMAIN_MEMORY_MODEL_NVDIMM:
+ case VIR_DOMAIN_MEMORY_MODEL_LAST:
+ break;
+ }
}
return 0;
@@ -3110,6 +3127,7 @@ qemuDomainAssignMemoryDeviceSlot(virQEMUDriverPtr driver,
break;
case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM:
+ case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM:
return qemuDomainEnsurePCIAddress(vm, &dev, driver);
break;
@@ -3135,6 +3153,7 @@ qemuDomainReleaseMemoryDeviceSlot(virDomainObjPtr vm,
break;
case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM:
+ case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM:
qemuDomainReleaseDeviceAddress(vm, &mem->info);
break;
@@ -3169,6 +3188,7 @@ qemuDomainAssignMemorySlots(virDomainDefPtr def)
break;
case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM:
+ case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM:
/* handled in qemuDomainAssignPCIAddresses() */
break;
case VIR_DOMAIN_MEMORY_MODEL_NONE:
diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
index 066f153703..ebf42259b5 100644
--- a/src/qemu/qemu_process.c
+++ b/src/qemu/qemu_process.c
@@ -3869,6 +3869,7 @@ qemuProcessDomainMemoryDefNeedHugepagesPath(const virDomainMemoryDef *mem,
{
switch (mem->model) {
case VIR_DOMAIN_MEMORY_MODEL_DIMM:
+ case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM:
return mem->pagesize &&
mem->pagesize != system_pagesize;
@@ -3933,6 +3934,7 @@ qemuProcessNeedMemoryBackingPath(virDomainDefPtr def,
if (mem) {
switch (mem->model) {
case VIR_DOMAIN_MEMORY_MODEL_DIMM:
+ case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM:
if (mem->access != VIR_DOMAIN_MEMORY_ACCESS_DEFAULT) {
/* No need to check for access mode on the target node,
* it was checked for in the previous loop. */
diff --git a/src/qemu/qemu_validate.c b/src/qemu/qemu_validate.c
index 2541ae856a..6150083251 100644
--- a/src/qemu/qemu_validate.c
+++ b/src/qemu/qemu_validate.c
@@ -4653,6 +4653,7 @@ qemuValidateDomainDeviceDefHub(virDomainHubDefPtr hub,
static int
qemuValidateDomainDeviceDefMemory(virDomainMemoryDefPtr mem,
+ const virDomainDef *def,
virQEMUCapsPtr qemuCaps)
{
switch (mem->model) {
@@ -4688,6 +4689,25 @@ qemuValidateDomainDeviceDefMemory(virDomainMemoryDefPtr mem,
}
break;
+ case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM:
+ /* Accounting balloon and virtio-mem is hard. We have plenty of APIs
+ * which take balloon from QEMU and report it to users. We would have
+ * to change all that and account for virtio-mem actual size. Also,
+ * virtio-mem is supposed to be replacement for balloon. Disable
+ * coexistence of these two for now. */
+ if (virDomainDefHasMemballoon(def)) {
+ virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s",
+ _("virtio-mem is not supported with memory balloon"));
+ return -1;
+ }
+
+ if (!virQEMUCapsGet(qemuCaps, QEMU_CAPS_DEVICE_VIRTIO_MEM_PCI)) {
+ virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s",
+ _("virtio-mem isn't supported by this QEMU binary"));
+ return -1;
+ }
+ break;
+
case VIR_DOMAIN_MEMORY_MODEL_NONE:
case VIR_DOMAIN_MEMORY_MODEL_LAST:
break;
@@ -4842,7 +4862,7 @@ qemuValidateDomainDeviceDef(const virDomainDeviceDef *dev,
break;
case VIR_DOMAIN_DEVICE_MEMORY:
- ret = qemuValidateDomainDeviceDefMemory(dev->data.memory, qemuCaps);
+ ret = qemuValidateDomainDeviceDefMemory(dev->data.memory, def, qemuCaps);
break;
case VIR_DOMAIN_DEVICE_SHMEM:
diff --git a/src/security/security_apparmor.c b/src/security/security_apparmor.c
index e6ecb513b1..5aeaf975a1 100644
--- a/src/security/security_apparmor.c
+++ b/src/security/security_apparmor.c
@@ -693,6 +693,7 @@ AppArmorSetMemoryLabel(virSecurityManagerPtr mgr,
return reload_profile(mgr, def, mem->nvdimmPath, true);
case VIR_DOMAIN_MEMORY_MODEL_NONE:
case VIR_DOMAIN_MEMORY_MODEL_DIMM:
+ case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM:
case VIR_DOMAIN_MEMORY_MODEL_LAST:
break;
}
diff --git a/src/security/security_dac.c b/src/security/security_dac.c
index 00eeae0d27..5ae645f25b 100644
--- a/src/security/security_dac.c
+++ b/src/security/security_dac.c
@@ -1871,6 +1871,7 @@ virSecurityDACRestoreMemoryLabel(virSecurityManagerPtr mgr,
break;
case VIR_DOMAIN_MEMORY_MODEL_DIMM:
+ case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM:
case VIR_DOMAIN_MEMORY_MODEL_LAST:
case VIR_DOMAIN_MEMORY_MODEL_NONE:
ret = 0;
@@ -2054,6 +2055,7 @@ virSecurityDACSetMemoryLabel(virSecurityManagerPtr mgr,
break;
case VIR_DOMAIN_MEMORY_MODEL_DIMM:
+ case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM:
case VIR_DOMAIN_MEMORY_MODEL_LAST:
case VIR_DOMAIN_MEMORY_MODEL_NONE:
ret = 0;
diff --git a/src/security/security_selinux.c b/src/security/security_selinux.c
index 1d1d9edfff..c0a3b28044 100644
--- a/src/security/security_selinux.c
+++ b/src/security/security_selinux.c
@@ -1584,6 +1584,7 @@ virSecuritySELinuxSetMemoryLabel(virSecurityManagerPtr mgr,
case VIR_DOMAIN_MEMORY_MODEL_NONE:
case VIR_DOMAIN_MEMORY_MODEL_DIMM:
+ case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM:
case VIR_DOMAIN_MEMORY_MODEL_LAST:
break;
}
@@ -1611,6 +1612,7 @@ virSecuritySELinuxRestoreMemoryLabel(virSecurityManagerPtr mgr,
break;
case VIR_DOMAIN_MEMORY_MODEL_DIMM:
+ case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM:
case VIR_DOMAIN_MEMORY_MODEL_NONE:
case VIR_DOMAIN_MEMORY_MODEL_LAST:
ret = 0;
diff --git a/tests/qemuxml2argvdata/memory-hotplug-virtio-mem.xml b/tests/qemuxml2argvdata/memory-hotplug-virtio-mem.xml
new file mode 100644
index 0000000000..eed9e35d0d
--- /dev/null
+++ b/tests/qemuxml2argvdata/memory-hotplug-virtio-mem.xml
@@ -0,0 +1,64 @@
+<domain type='kvm'>
+ <name>QEMUGuest1</name>
+ <uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid>
+ <maxMemory slots='16' unit='KiB'>1099511627776</maxMemory>
+ <memory unit='KiB'>8388608</memory>
+ <currentMemory unit='KiB'>8388608</currentMemory>
+ <vcpu placement='static' cpuset='0-1'>2</vcpu>
+ <os>
+ <type arch='x86_64' machine='pc'>hvm</type>
+ <boot dev='hd'/>
+ </os>
+ <cpu mode='custom' match='exact' check='none'>
+ <model fallback='forbid'>qemu64</model>
+ <topology sockets='2' dies='1' cores='1' threads='1'/>
+ <numa>
+ <cell id='0' cpus='0-1' memory='2095104' unit='KiB'/>
+ </numa>
+ </cpu>
+ <clock offset='utc'/>
+ <on_poweroff>destroy</on_poweroff>
+ <on_reboot>restart</on_reboot>
+ <on_crash>destroy</on_crash>
+ <devices>
+ <emulator>/usr/bin/qemu-system-i386</emulator>
+ <disk type='block' device='disk'>
+ <driver name='qemu' type='raw'/>
+ <source dev='/dev/HostVG/QEMUGuest1'/>
+ <target dev='hda' bus='ide'/>
+ <address type='drive' controller='0' bus='0' target='0' unit='0'/>
+ </disk>
+ <controller type='ide' index='0'>
+ <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>
+ </controller>
+ <controller type='usb' index='0' model='piix3-uhci'>
+ <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/>
+ </controller>
+ <controller type='pci' index='0' model='pci-root'/>
+ <input type='mouse' bus='ps2'/>
+ <input type='keyboard' bus='ps2'/>
+ <memballoon model='none'/>
+ <memory model='virtio-mem'>
+ <target>
+ <size unit='KiB'>1048576</size>
+ <node>0</node>
+ <block unit='KiB'>2048</block>
+ <requested unit='KiB'>524288</requested>
+ </target>
+ <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
+ </memory>
+ <memory model='virtio-mem'>
+ <source>
+ <nodemask>1-3</nodemask>
+ <pagesize unit='KiB'>2048</pagesize>
+ </source>
+ <target>
+ <size unit='KiB'>2097152</size>
+ <node>0</node>
+ <block unit='KiB'>2048</block>
+ <requested unit='KiB'>1048576</requested>
+ </target>
+ <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
+ </memory>
+ </devices>
+</domain>
diff --git a/tests/qemuxml2xmloutdata/memory-hotplug-virtio-mem.x86_64-latest.xml b/tests/qemuxml2xmloutdata/memory-hotplug-virtio-mem.x86_64-latest.xml
new file mode 120000
index 0000000000..a9d298129c
--- /dev/null
+++ b/tests/qemuxml2xmloutdata/memory-hotplug-virtio-mem.x86_64-latest.xml
@@ -0,0 +1 @@
+../qemuxml2argvdata/memory-hotplug-virtio-mem.xml
\ No newline at end of file
diff --git a/tests/qemuxml2xmltest.c b/tests/qemuxml2xmltest.c
index 5cd945f28f..3a77f8d6c0 100644
--- a/tests/qemuxml2xmltest.c
+++ b/tests/qemuxml2xmltest.c
@@ -1267,6 +1267,7 @@ mymain(void)
QEMU_CAPS_DEVICE_NVDIMM,
QEMU_CAPS_LAST);
DO_TEST_CAPS_LATEST("memory-hotplug-virtio-pmem");
+ DO_TEST_CAPS_LATEST("memory-hotplug-virtio-mem");
DO_TEST("net-udp", NONE);
--
2.26.2
On 18.02.21 14:31, Michal Privoznik wrote:
> The virtio-mem is paravirtualized mechanism of adding/removing
> memory to/from a VM. A virtio-mem-pci device is split into blocks
> of equal size which are then exposed (all or only a requested
> portion of them) to the guest kernel to use as regular memory.
> Therefore, the device has two important attributes:
>
> 1) block-size, which defines the size of a block
> 2) requested-size, which defines how much memory (in bytes)
> is the device requested to expose to the guest.
>
> The 'block-size' is configured on command line and immutable
> throughout device's lifetime. The 'requested-size' can be set on
> the command line too, but also is adjustable via monitor. In
> fact, that is how management software places its requests to
> change the memory allocation. If it wants to give more memory to
> the guest it changes 'requested-size' to a bigger value, and if it
> wants to shrink guest memory it changes the 'requested-size' to a
> smaller value. Note, value of zero means that guest should
> release all memory offered by the device. Of course, guest has to
> cooperate. Therefore, there is a third attribute 'size' which is
> read only and reflects how much memory the guest still has. This
> can be different to 'requested-size', obviously. Because of name
> clash, I've named it 'actualsize' and it is dealt with in future
> commits (it is a runtime information anyway).
>
> In the backend, memory for virtio-mem is backed by usual objects:
> memory-backend-{ram,file,memfd} and their size puts the cap on
> the amount of memory that a virtio-mem device can offer to a
> guest. But we are already able to express this info using <size/>
> under <target/>.
>
> Therefore, we need only two more elements to cover 'block-size'
> and 'requested-size' attributes. This is the XML I've came up
> with:
>
> <memory model='virtio-mem'>
> <source>
> <nodemask>1-3</nodemask>
> <pagesize unit='KiB'>2048</pagesize>
> </source>
> <target>
> <size unit='KiB'>2097152</size>
> <node>0</node>
> <block unit='KiB'>2048</block>
> <requested unit='KiB'>1048576</requested>
> </target>
> <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
> </memory>
>
> I hope by now it is obvious that:
>
> 1) 'requested-size' must be an integer multiple of
> 'block-size', and
> 2) virtio-mem-pci device goes onto PCI bus and thus needs PCI
> address.
>
> Then there is a limitation that the minimal 'block-size' is
> transparent huge page size (I'll leave this without explanation).
>
> Since now we have (possibly) two or more devices that allow
> memory inflation/deflation and accounting for all of them (and
> thus keeping <currentMemory/> updated) might be hard. Therefore,
> I'm deliberately forbidding memballoon. It's okay - virtio-mem is
> superior to memballoon anyway. We can always reevaluate later.
That's a bad idea. It'll still be used for getting memory stats, free
page hinting and free page reporting.
Very weird use cases might even want to mix balloon inflation/deflation
with virtio-mem ...
--
Thanks,
David / dhildenb
On 2/18/21 4:00 PM, David Hildenbrand wrote:
> On 18.02.21 14:31, Michal Privoznik wrote:
>>
>> Since now we have (possibly) two or more devices that allow
>> memory inflation/deflation and accounting for all of them (and
>> thus keeping <currentMemory/> updated) might be hard. Therefore,
>> I'm deliberately forbidding memballoon. It's okay - virtio-mem is
>> superior to memballoon anyway. We can always reevaluate later.
>
> That's a bad idea. It'll still be used for getting memory stats, free
> page hinting and free page reporting.
>
> Very weird use cases might even want to mix balloon inflation/deflation
> with virtio-mem ...
>
I understand that, but then accounting for memory at libvirt level
becomes hard. What I mean, currently we have three elements for
reporting guest memory:
<domain>
<maxMemory slots='16' unit='KiB'>1524288</maxMemory>
<memory unit='KiB'>524288</memory>
<currentMemory unit='KiB'>524288</currentMemory>
...
</domain>
<maxMemory/> is for memory hotplug (corresponds to -m maxmem=X,slots=Y)
<memory/> is the initial amount guest has PLUS sum of sizes of all
memory devices. For instance if I want to give 1GiB initial amount and
have 1 pc-dimm of 2GiB size this element will report 3GiB.
<currentMemory/> then reflects actual balloon size (gets updated on
BALLOON_CHANGE event and when libvirt finds fit via query-balloon).
In my experiments, balloon does not account for virtio-mem actual size.
Therefore, I can see following options:
1) fix <currentMemory/> so that it does balloon size +
sum(virtio-mem.actual), or
2) don't touch <currentMemory/> at all and let mgmt apps compute the sum
themselves (should they need it- which I believe they will),
3) come up with new, forth element (awful).
The benefit of 1) is that the element will report the same number as
'free' ran within guest. Problem is that I'd have to change all places
where we take the size of the balloon device directly and add sum of
actual sizes of virtio-mem devices (e.g. all stats APIs, ballon changed
event that we fire when we see qemu's event). And okay, if I do that -
then the only way to learn true value of balloon would be to take
current memory and subtract the sum of virtio-mem actual sizes. Which is
like option 2) but reversed. And since I could not decide which way to
go, I made it problem of future me by forbidding balloon and virtio-mem
coexistence.
Michal
On 22.02.21 16:46, Michal Privoznik wrote:
> On 2/18/21 4:00 PM, David Hildenbrand wrote:
>> On 18.02.21 14:31, Michal Privoznik wrote:
>
>>>
>>> Since now we have (possibly) two or more devices that allow
>>> memory inflation/deflation and accounting for all of them (and
>>> thus keeping <currentMemory/> updated) might be hard. Therefore,
>>> I'm deliberately forbidding memballoon. It's okay - virtio-mem is
>>> superior to memballoon anyway. We can always reevaluate later.
>>
>> That's a bad idea. It'll still be used for getting memory stats, free
>> page hinting and free page reporting.
>>
>> Very weird use cases might even want to mix balloon inflation/deflation
>> with virtio-mem ...
>>
>
> I understand that, but then accounting for memory at libvirt level
> becomes hard. What I mean, currently we have three elements for
> reporting guest memory:
>
> <domain>
> <maxMemory slots='16' unit='KiB'>1524288</maxMemory>
> <memory unit='KiB'>524288</memory>
> <currentMemory unit='KiB'>524288</currentMemory>
> ...
> </domain>
>
> <maxMemory/> is for memory hotplug (corresponds to -m maxmem=X,slots=Y)
>
> <memory/> is the initial amount guest has PLUS sum of sizes of all
> memory devices. For instance if I want to give 1GiB initial amount and
> have 1 pc-dimm of 2GiB size this element will report 3GiB.
>
> <currentMemory/> then reflects actual balloon size (gets updated on
> BALLOON_CHANGE event and when libvirt finds fit via query-balloon).
>
> In my experiments, balloon does not account for virtio-mem actual size.
> Therefore, I can see following options:
>
> 1) fix <currentMemory/> so that it does balloon size +
> sum(virtio-mem.actual), or
>
> 2) don't touch <currentMemory/> at all and let mgmt apps compute the sum
> themselves (should they need it- which I believe they will),
>
> 3) come up with new, forth element (awful).
>
> The benefit of 1) is that the element will report the same number as
> 'free' ran within guest. Problem is that I'd have to change all places
> where we take the size of the balloon device directly and add sum of
> actual sizes of virtio-mem devices (e.g. all stats APIs, ballon changed
> event that we fire when we see qemu's event). And okay, if I do that -
> then the only way to learn true value of balloon would be to take
> current memory and subtract the sum of virtio-mem actual sizes. Which is
> like option 2) but reversed. And since I could not decide which way to
> go, I made it problem of future me by forbidding balloon and virtio-mem
> coexistence.
In QEMU, we could make "info balloon" etc. also include virtio-mem
provided memory.
The main reason I did not do so initially is that
1) It's racy when reading/writing the balloon size. The QEMU interface
is broken as we don't get/set the size of the balloon size but
instead the logical VM size. While someone sets the logical VM size
via virtio-balloon, the logical VM size might change due to virtio-
mem guest activity.
2) I don't want people to be tempted to use both at the same
time.
Maybe the right think to do is make "info balloon" report the current
logical VM size. If there are races, bad luck - better not use both
things at the same time.
--
Thanks,
David / dhildenb
On Mon, Feb 22, 2021 at 16:57:11 +0100, David Hildenbrand wrote: > On 22.02.21 16:46, Michal Privoznik wrote: > > On 2/18/21 4:00 PM, David Hildenbrand wrote: > > > On 18.02.21 14:31, Michal Privoznik wrote: [...] > In QEMU, we could make "info balloon" etc. also include virtio-mem provided > memory. > > The main reason I did not do so initially is that > 1) It's racy when reading/writing the balloon size. The QEMU interface > is broken as we don't get/set the size of the balloon size but > instead the logical VM size. While someone sets the logical VM size > via virtio-balloon, the logical VM size might change due to virtio- > mem guest activity. > 2) I don't want people to be tempted to use both at the same > time. > > Maybe the right think to do is make "info balloon" report the current > logical VM size. If there are races, bad luck - better not use both things > at the same time. We specifically don't call info balloon or any equivalent nowadays if qemu supports the balloon event and we should keep it like that at least in terms of the active XML. The bulk stats functions we have already call the monitor so it's okay to do it there. Both virtio-mem and balloon do have events so we should be able to update any required definition bits without an explicit call to the balloon info apart from the stats code and the refresh on reconnect.
On 22.02.21 17:28, Peter Krempa wrote: > On Mon, Feb 22, 2021 at 16:57:11 +0100, David Hildenbrand wrote: >> On 22.02.21 16:46, Michal Privoznik wrote: >>> On 2/18/21 4:00 PM, David Hildenbrand wrote: >>>> On 18.02.21 14:31, Michal Privoznik wrote: > > [...] > >> In QEMU, we could make "info balloon" etc. also include virtio-mem provided >> memory. >> >> The main reason I did not do so initially is that >> 1) It's racy when reading/writing the balloon size. The QEMU interface >> is broken as we don't get/set the size of the balloon size but >> instead the logical VM size. While someone sets the logical VM size >> via virtio-balloon, the logical VM size might change due to virtio- >> mem guest activity. >> 2) I don't want people to be tempted to use both at the same >> time. >> >> Maybe the right think to do is make "info balloon" report the current >> logical VM size. If there are races, bad luck - better not use both things >> at the same time. > > We specifically don't call info balloon or any equivalent nowadays if > qemu supports the balloon event and we should keep it like that at least > in terms of the active XML. > > The bulk stats functions we have already call the monitor so it's okay > to do it there. > > Both virtio-mem and balloon do have events so we should be able to > update any required definition bits without an explicit call to the > balloon info apart from the stats code and the refresh on reconnect. > Just to clarify, any BALLOON communication with the outer world (set/get/balloon_change) are in terms of "logical VM size", not "balloon size". logical_vm_size = vm_size - balloon_size vm_size does currently not include virtio-mem provided memory. -- Thanks, David / dhildenb
On Thu, Feb 18, 2021 at 04:00:05PM +0100, David Hildenbrand wrote:
> On 18.02.21 14:31, Michal Privoznik wrote:
> > The virtio-mem is paravirtualized mechanism of adding/removing
> > memory to/from a VM. A virtio-mem-pci device is split into blocks
> > of equal size which are then exposed (all or only a requested
> > portion of them) to the guest kernel to use as regular memory.
> > Therefore, the device has two important attributes:
> >
> > 1) block-size, which defines the size of a block
> > 2) requested-size, which defines how much memory (in bytes)
> > is the device requested to expose to the guest.
> >
> > The 'block-size' is configured on command line and immutable
> > throughout device's lifetime. The 'requested-size' can be set on
> > the command line too, but also is adjustable via monitor. In
> > fact, that is how management software places its requests to
> > change the memory allocation. If it wants to give more memory to
> > the guest it changes 'requested-size' to a bigger value, and if it
> > wants to shrink guest memory it changes the 'requested-size' to a
> > smaller value. Note, value of zero means that guest should
> > release all memory offered by the device. Of course, guest has to
> > cooperate. Therefore, there is a third attribute 'size' which is
> > read only and reflects how much memory the guest still has. This
> > can be different to 'requested-size', obviously. Because of name
> > clash, I've named it 'actualsize' and it is dealt with in future
> > commits (it is a runtime information anyway).
> >
> > In the backend, memory for virtio-mem is backed by usual objects:
> > memory-backend-{ram,file,memfd} and their size puts the cap on
> > the amount of memory that a virtio-mem device can offer to a
> > guest. But we are already able to express this info using <size/>
> > under <target/>.
> >
> > Therefore, we need only two more elements to cover 'block-size'
> > and 'requested-size' attributes. This is the XML I've came up
> > with:
> >
> > <memory model='virtio-mem'>
> > <source>
> > <nodemask>1-3</nodemask>
> > <pagesize unit='KiB'>2048</pagesize>
> > </source>
> > <target>
> > <size unit='KiB'>2097152</size>
> > <node>0</node>
> > <block unit='KiB'>2048</block>
> > <requested unit='KiB'>1048576</requested>
> > </target>
> > <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
> > </memory>
> >
> > I hope by now it is obvious that:
> >
> > 1) 'requested-size' must be an integer multiple of
> > 'block-size', and
> > 2) virtio-mem-pci device goes onto PCI bus and thus needs PCI
> > address.
> >
> > Then there is a limitation that the minimal 'block-size' is
> > transparent huge page size (I'll leave this without explanation).
> >
> > Since now we have (possibly) two or more devices that allow
> > memory inflation/deflation and accounting for all of them (and
> > thus keeping <currentMemory/> updated) might be hard. Therefore,
> > I'm deliberately forbidding memballoon. It's okay - virtio-mem is
> > superior to memballoon anyway. We can always reevaluate later.
>
> That's a bad idea. It'll still be used for getting memory stats, free page
> hinting and free page reporting.
Yep, and this feature is broadly used in mgmt apps, so if we enforce this
mutual exclusion it is putting apps into a no-win scenario.
> Very weird use cases might even want to mix balloon inflation/deflation with
> virtio-mem ...
I'd feel less bad about blocking inflation/deflation, but that does go
against the general libvirt POV that we should leave usage policies upto
the mgmt app to decide upon.
Regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
On Thu, Feb 18, 2021 at 14:31:00 +0100, Michal Privoznik wrote:
> The virtio-mem is paravirtualized mechanism of adding/removing
> memory to/from a VM. A virtio-mem-pci device is split into blocks
> of equal size which are then exposed (all or only a requested
> portion of them) to the guest kernel to use as regular memory.
> Therefore, the device has two important attributes:
>
> 1) block-size, which defines the size of a block
> 2) requested-size, which defines how much memory (in bytes)
> is the device requested to expose to the guest.
>
> The 'block-size' is configured on command line and immutable
> throughout device's lifetime. The 'requested-size' can be set on
> the command line too, but also is adjustable via monitor. In
> fact, that is how management software places its requests to
> change the memory allocation. If it wants to give more memory to
> the guest it changes 'requested-size' to a bigger value, and if it
> wants to shrink guest memory it changes the 'requested-size' to a
> smaller value. Note, value of zero means that guest should
> release all memory offered by the device. Of course, guest has to
> cooperate. Therefore, there is a third attribute 'size' which is
> read only and reflects how much memory the guest still has. This
> can be different to 'requested-size', obviously. Because of name
> clash, I've named it 'actualsize' and it is dealt with in future
> commits (it is a runtime information anyway).
>
> In the backend, memory for virtio-mem is backed by usual objects:
> memory-backend-{ram,file,memfd} and their size puts the cap on
> the amount of memory that a virtio-mem device can offer to a
> guest. But we are already able to express this info using <size/>
> under <target/>.
>
> Therefore, we need only two more elements to cover 'block-size'
> and 'requested-size' attributes. This is the XML I've came up
> with:
>
> <memory model='virtio-mem'>
> <source>
> <nodemask>1-3</nodemask>
> <pagesize unit='KiB'>2048</pagesize>
> </source>
> <target>
> <size unit='KiB'>2097152</size>
> <node>0</node>
> <block unit='KiB'>2048</block>
> <requested unit='KiB'>1048576</requested>
> </target>
> <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
> </memory>
>
> I hope by now it is obvious that:
>
> 1) 'requested-size' must be an integer multiple of
> 'block-size', and
> 2) virtio-mem-pci device goes onto PCI bus and thus needs PCI
> address.
>
> Then there is a limitation that the minimal 'block-size' is
> transparent huge page size (I'll leave this without explanation).
>
> Since now we have (possibly) two or more devices that allow
> memory inflation/deflation and accounting for all of them (and
> thus keeping <currentMemory/> updated) might be hard. Therefore,
> I'm deliberately forbidding memballoon. It's okay - virtio-mem is
> superior to memballoon anyway. We can always reevaluate later.
>
> Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
> ---
> docs/formatdomain.rst | 51 +++++++++++++--
> docs/schemas/domaincommon.rng | 11 ++++
> src/conf/domain_conf.c | 53 ++++++++++++++-
> src/conf/domain_conf.h | 3 +
> src/conf/domain_validate.c | 39 +++++++++++
> src/qemu/qemu_alias.c | 1 +
> src/qemu/qemu_command.c | 1 +
> src/qemu/qemu_domain.c | 27 +++++++-
> src/qemu/qemu_domain_address.c | 38 ++++++++---
> src/qemu/qemu_process.c | 2 +
> src/qemu/qemu_validate.c | 22 ++++++-
> src/security/security_apparmor.c | 1 +
> src/security/security_dac.c | 2 +
> src/security/security_selinux.c | 2 +
> .../memory-hotplug-virtio-mem.xml | 64 +++++++++++++++++++
> ...emory-hotplug-virtio-mem.x86_64-latest.xml | 1 +
> tests/qemuxml2xmltest.c | 1 +
> 17 files changed, 298 insertions(+), 21 deletions(-)
> create mode 100644 tests/qemuxml2argvdata/memory-hotplug-virtio-mem.xml
> create mode 120000 tests/qemuxml2xmloutdata/memory-hotplug-virtio-mem.x86_64-latest.xml
>
> diff --git a/docs/formatdomain.rst b/docs/formatdomain.rst
> index 2587106191..05d359a32d 100644
> --- a/docs/formatdomain.rst
> +++ b/docs/formatdomain.rst
[...]
> @@ -6935,6 +6937,16 @@ Example: automatically added device with KVM
> </devices>
> ...
>
> +Example: automatically added device with QEMU/KVM and a ``virtio-mem`` device:
> +
> +::
> +
> + ...
> + <devices>
The example is missing the 'virtio-mem' device which causes the balloon
to be disabled.
> + <memballoon model='none'/>
> + </devices>
> + ...
> +
> Example: manually added device with static PCI slot 2 requested
>
> ::
[...]
> diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c
> index 541d592bbe..26b9b5583e 100644
> --- a/src/qemu/qemu_domain.c
> +++ b/src/qemu/qemu_domain.c
> @@ -3679,10 +3679,21 @@ qemuDomainDefAddDefaultDevices(virDomainDefPtr def,
> }
>
> if (addDefaultMemballoon && !def->memballoon) {
> - virDomainMemballoonDefPtr memballoon;
> - memballoon = g_new0(virDomainMemballoonDef, 1);
> + virDomainMemballoonDefPtr memballoon = g_new0(virDomainMemballoonDef, 1);
> + size_t i;
>
> - memballoon->model = VIR_DOMAIN_MEMBALLOON_MODEL_VIRTIO;
> + /* To simplify virtio-mem implementation, memballoon has to be turned
> + * off if domain has a virtio-mem device. See
> + * qemuValidateDomainDeviceDefMemory() for more details. */
> + for (i = 0; i < def->nmems; i++) {
> + if (def->mems[i]->model == VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM)
> + break;
> + }
> +
> + if (i == def->nmems)
> + memballoon->model = VIR_DOMAIN_MEMBALLOON_MODEL_VIRTIO;
> + else
> + memballoon->model = VIR_DOMAIN_MEMBALLOON_MODEL_NONE;
> def->memballoon = memballoon;
> }
This might be a point of contention. I can see why we'd want to do this
but people might have other ideas. Preferably I'd like danpb's input.
I think the best approach will be to not hide it into the massive patch
adding the boring XML bits but split it separately.
[...]
> diff --git a/src/qemu/qemu_validate.c b/src/qemu/qemu_validate.c
> index 2541ae856a..6150083251 100644
> --- a/src/qemu/qemu_validate.c
> +++ b/src/qemu/qemu_validate.c
[...]
> @@ -4688,6 +4689,25 @@ qemuValidateDomainDeviceDefMemory(virDomainMemoryDefPtr mem,
> }
> break;
>
> + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM:
> + /* Accounting balloon and virtio-mem is hard. We have plenty of APIs
> + * which take balloon from QEMU and report it to users. We would have
> + * to change all that and account for virtio-mem actual size. Also,
> + * virtio-mem is supposed to be replacement for balloon. Disable
> + * coexistence of these two for now. */
> + if (virDomainDefHasMemballoon(def)) {
> + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s",
> + _("virtio-mem is not supported with memory balloon"));
> + return -1;
> + }
> +
> + if (!virQEMUCapsGet(qemuCaps, QEMU_CAPS_DEVICE_VIRTIO_MEM_PCI)) {
> + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s",
> + _("virtio-mem isn't supported by this QEMU binary"));
> + return -1;
> + }
> + break;
And this one too. I agree with this hunk as is, but not in this commit.
© 2016 - 2026 Red Hat, Inc.