.gitignore | 1 + block/qcow2.c | 20 +++++++++++++------- block/qcow2.h | 4 ---- docs/qcow2-cache.txt | 30 +++++++++++++++++------------- qapi/block-core.json | 6 +++--- qemu-options.hx | 15 +++++++++------ tests/qemu-iotests/103.out | 4 ++-- tests/qemu-iotests/137.out | 2 +- 8 files changed, 46 insertions(+), 36 deletions(-)
This series makes the qcow2 L2 cache cover the entire image by default. The importance of this change is in noticeable performance improvement, especially with heavy random I/O. The memory overhead is very small: only 1 MB of cache for every 8 GB of image size. On systems with very limited RAM the maximal cache size can be limited by the existing cache-size and l2-cache-size options. The L2 cache is also resized accordingly, by default, if the image is resized. The reasons for making this behavior default, unlike in the patches I have sent previously, are the following: 1) Unelegant complications with making an option to accept either a size or a string, while outputting a proper error message. 2) More bulky logic to sort out what to do if the image is being resized but the (defined) overall cache size is too small to contain the new l2-cache-size. (Making this behavior default resolves all of these technical issues neatly) 3) The performance gain (as measured by fio in random read/write tests) can be as high as 50%, or even more, so this would be a reasonable default behavior. 4) The memory overhead is really small for the gain, and in cases when memory economy is critical, the maximal cache values can always be set by the appropriate options. Leonid Bloch (6): Update .gitignore qcow2: A grammar fix in conflicting cache sizing error message qcow2: Options' documentation fixes qcow2: Update total_sectors when resizing the image qcow2: Make the default L2 cache sufficient to cover the entire image qcow2: Resize the cache upon image resizing .gitignore | 1 + block/qcow2.c | 20 +++++++++++++------- block/qcow2.h | 4 ---- docs/qcow2-cache.txt | 30 +++++++++++++++++------------- qapi/block-core.json | 6 +++--- qemu-options.hx | 15 +++++++++------ tests/qemu-iotests/103.out | 4 ++-- tests/qemu-iotests/137.out | 2 +- 8 files changed, 46 insertions(+), 36 deletions(-) -- 2.17.1
Am 29.07.2018 um 23:27 hat Leonid Bloch geschrieben:
> This series makes the qcow2 L2 cache cover the entire image by default.
> The importance of this change is in noticeable performance improvement,
> especially with heavy random I/O. The memory overhead is very small:
> only 1 MB of cache for every 8 GB of image size. On systems with very
> limited RAM the maximal cache size can be limited by the existing
> cache-size and l2-cache-size options.
>
> The L2 cache is also resized accordingly, by default, if the image is
> resized.
I agree with changing the defaults, I would have proposed a change
myself soon. We have been offering cache size options for a long time,
and management tools are still ignoring them. So we need to do something
in QEMU.
Now, what the exact defaults should be, is something to use a bit more
thought on. You say "the memory overhead is very small", without going
into details. Let's check.
This is the formula from your patch (unit is bytes):
uint64_t max_l2_cache = virtual_disk_size / (s->cluster_size / 8);
So we get:
64k clusters, 8G image: 1M (maximum covered by old default)
64k clusters, 1T image: 128M
1M clusters, 1T image: 8M
512b clusters, 1T image: 16G
1T is by far not the maximum size for a qcow2 image, and 16G memory
usage is definitely not "very small" overhead. Even the 128M for the
default cluster size are already a lot. So the simplistic approch of
this series won't work as a default.
Let's refine it a bit:
* Basing it on max_l2_cache sounds fine generally
* Cap it at 32 MB (?) by default
* Enable cache-clean-interval=30 (?) by default to compensate a bit for
the higher maximum memory usage
Another thing I just noticed while looking at the code is that
cache-clean-interval only considers blocks that aren't dirty, but
doesn't take measures to get dirty blocks written out, so we depend on
the guest flushing the cache before we can get free the memory. Should
we attempt to write unused dirty entries back? Berto?
> The reasons for making this behavior default, unlike in the patches I
> have sent previously, are the following:
> 1) Unelegant complications with making an option to accept either a
> size or a string, while outputting a proper error message.
> 2) More bulky logic to sort out what to do if the image is being resized
> but the (defined) overall cache size is too small to contain the
> new l2-cache-size.
> (Making this behavior default resolves all of these technical issues
> neatly)
The default must always correspond to some explicit setting. We can only
change defaults relatively freely because we can assume that if a user
cared about the setting, they would have specified it explicitly. If
it's not possible to specify a setting explicitly, we can't make this
assumption any more, so we'd be stuck with the default forever.
Now, considering what I wrote above, an alternate wouldn't be the best
way to represent this any more. We do have a cache size again (the 32 MB
cap) even while trying to cover the whole image.
Maybe the right interface with this in mind would be a boolean option
that specifies whether the given cache sizes are exact values (old
semantics) or maximum values, which are limited to what the actual
images size requires. If you do want the "real" full mode, you'd set
l2-cache-size=INT64_MAX and exact-cache-sizes=false (this could use a
better name). The new default would be l2-cache-size=32M,
exact-cache-sizes=false. The old default is cache-size=1M,
exact-cache-sizes=true.
Kevin
> 3) The performance gain (as measured by fio in random read/write tests)
> can be as high as 50%, or even more, so this would be a reasonable
> default behavior.
> 4) The memory overhead is really small for the gain, and in cases when
> memory economy is critical, the maximal cache values can always be
> set by the appropriate options.
On 07/30/2018 01:55 PM, Kevin Wolf wrote: > Am 29.07.2018 um 23:27 hat Leonid Bloch geschrieben: >> This series makes the qcow2 L2 cache cover the entire image by default. >> The importance of this change is in noticeable performance improvement, >> especially with heavy random I/O. The memory overhead is very small: >> only 1 MB of cache for every 8 GB of image size. On systems with very >> limited RAM the maximal cache size can be limited by the existing >> cache-size and l2-cache-size options. >> >> The L2 cache is also resized accordingly, by default, if the image is >> resized. > > I agree with changing the defaults, I would have proposed a change > myself soon. We have been offering cache size options for a long time, > and management tools are still ignoring them. So we need to do something > in QEMU. > > Now, what the exact defaults should be, is something to use a bit more > thought on. You say "the memory overhead is very small", without going > into details. Let's check. > > This is the formula from your patch (unit is bytes): > > uint64_t max_l2_cache = virtual_disk_size / (s->cluster_size / 8); > > So we get: > > 64k clusters, 8G image: 1M (maximum covered by old default) > 64k clusters, 1T image: 128M > 1M clusters, 1T image: 8M > 512b clusters, 1T image: 16G > > 1T is by far not the maximum size for a qcow2 image, and 16G memory > usage is definitely not "very small" overhead. Even the 128M for the > default cluster size are already a lot. So the simplistic approch of > this series won't work as a default. > > Let's refine it a bit: > > * Basing it on max_l2_cache sounds fine generally > * Cap it at 32 MB (?) by default Yes! A great idea! Definitely necessary. > * Enable cache-clean-interval=30 (?) by default to compensate a bit for > the higher maximum memory usage Do you think that we need this if we cap the cache at 32 MB? And that's only the cap. 256 GB+ images are not that often used. And compared to the overall QEMU overhead, of over 500 MB, extra 32 in the worst case is not that much, considering the performance gain. > Another thing I just noticed while looking at the code is that > cache-clean-interval only considers blocks that aren't dirty, but > doesn't take measures to get dirty blocks written out, so we depend on > the guest flushing the cache before we can get free the memory. Should > we attempt to write unused dirty entries back? Berto? > >> The reasons for making this behavior default, unlike in the patches I >> have sent previously, are the following: >> 1) Unelegant complications with making an option to accept either a >> size or a string, while outputting a proper error message. >> 2) More bulky logic to sort out what to do if the image is being resized >> but the (defined) overall cache size is too small to contain the >> new l2-cache-size. >> (Making this behavior default resolves all of these technical issues >> neatly) > > The default must always correspond to some explicit setting. We can only > change defaults relatively freely because we can assume that if a user > cared about the setting, they would have specified it explicitly. If > it's not possible to specify a setting explicitly, we can't make this > assumption any more, so we'd be stuck with the default forever. > > Now, considering what I wrote above, an alternate wouldn't be the best > way to represent this any more. We do have a cache size again (the 32 MB > cap) even while trying to cover the whole image. > > Maybe the right interface with this in mind would be a boolean option > that specifies whether the given cache sizes are exact values (old > semantics) or maximum values, which are limited to what the actual > images size requires. If you do want the "real" full mode, you'd set > l2-cache-size=INT64_MAX and exact-cache-sizes=false (this could use a > better name). The new default would be l2-cache-size=32M, > exact-cache-sizes=false. The old default is cache-size=1M, > exact-cache-sizes=true. The existing cache-size and l2-cache-size options are already documented as maximal values. It would make sense to actually make them as such: the caches will be as big as necessary to cover the entire image (no need for them to be more than that) but they will be capped by the current options, while the new default of l2-cache-size will be 32M. Why would one need exact-cache-sizes, if they would be MIN(needed, max)? Does it sound reasonable? Leonid. > > Kevin > >> 3) The performance gain (as measured by fio in random read/write tests) >> can be as high as 50%, or even more, so this would be a reasonable >> default behavior. >> 4) The memory overhead is really small for the gain, and in cases when >> memory economy is critical, the maximal cache values can always be >> set by the appropriate options.
Am 30.07.2018 um 14:13 hat Leonid Bloch geschrieben: > On 07/30/2018 01:55 PM, Kevin Wolf wrote: > > Am 29.07.2018 um 23:27 hat Leonid Bloch geschrieben: > > > This series makes the qcow2 L2 cache cover the entire image by default. > > > The importance of this change is in noticeable performance improvement, > > > especially with heavy random I/O. The memory overhead is very small: > > > only 1 MB of cache for every 8 GB of image size. On systems with very > > > limited RAM the maximal cache size can be limited by the existing > > > cache-size and l2-cache-size options. > > > > > > The L2 cache is also resized accordingly, by default, if the image is > > > resized. > > > > I agree with changing the defaults, I would have proposed a change > > myself soon. We have been offering cache size options for a long time, > > and management tools are still ignoring them. So we need to do something > > in QEMU. > > > > Now, what the exact defaults should be, is something to use a bit more > > thought on. You say "the memory overhead is very small", without going > > into details. Let's check. > > > > This is the formula from your patch (unit is bytes): > > > > uint64_t max_l2_cache = virtual_disk_size / (s->cluster_size / 8); > > > > So we get: > > > > 64k clusters, 8G image: 1M (maximum covered by old default) > > 64k clusters, 1T image: 128M > > 1M clusters, 1T image: 8M > > 512b clusters, 1T image: 16G > > > > 1T is by far not the maximum size for a qcow2 image, and 16G memory > > usage is definitely not "very small" overhead. Even the 128M for the > > default cluster size are already a lot. So the simplistic approch of > > this series won't work as a default. > > > > Let's refine it a bit: > > > > * Basing it on max_l2_cache sounds fine generally > > * Cap it at 32 MB (?) by default > > Yes! A great idea! Definitely necessary. > > > * Enable cache-clean-interval=30 (?) by default to compensate a bit for > > the higher maximum memory usage > > Do you think that we need this if we cap the cache at 32 MB? And that's only > the cap. 256 GB+ images are not that often used. And compared to the overall > QEMU overhead, of over 500 MB, extra 32 in the worst case is not that much, > considering the performance gain. Don't forget that if you take a few snapshots so you get a backing file chain, every layer in the chain has its own cache. So capping at 32 MB with 9 backing files still means that we use 320 MB. And honestly, is there a real reason not to use cache-clean-interval by default? Even if it doesn't help much in some cases, it should hardly ever hurt. If the image was completely idle for 30 seconds (or whatever number we choose), having to reload some 64k from the disk on the next access shouldn't make a big difference. > > Another thing I just noticed while looking at the code is that > > cache-clean-interval only considers blocks that aren't dirty, but > > doesn't take measures to get dirty blocks written out, so we depend on > > the guest flushing the cache before we can get free the memory. Should > > we attempt to write unused dirty entries back? Berto? > > > > > The reasons for making this behavior default, unlike in the patches I > > > have sent previously, are the following: > > > 1) Unelegant complications with making an option to accept either a > > > size or a string, while outputting a proper error message. > > > 2) More bulky logic to sort out what to do if the image is being resized > > > but the (defined) overall cache size is too small to contain the > > > new l2-cache-size. > > > (Making this behavior default resolves all of these technical issues > > > neatly) > > > > The default must always correspond to some explicit setting. We can only > > change defaults relatively freely because we can assume that if a user > > cared about the setting, they would have specified it explicitly. If > > it's not possible to specify a setting explicitly, we can't make this > > assumption any more, so we'd be stuck with the default forever. > > > > Now, considering what I wrote above, an alternate wouldn't be the best > > way to represent this any more. We do have a cache size again (the 32 MB > > cap) even while trying to cover the whole image. > > > > Maybe the right interface with this in mind would be a boolean option > > that specifies whether the given cache sizes are exact values (old > > semantics) or maximum values, which are limited to what the actual > > images size requires. If you do want the "real" full mode, you'd set > > l2-cache-size=INT64_MAX and exact-cache-sizes=false (this could use a > > better name). The new default would be l2-cache-size=32M, > > exact-cache-sizes=false. The old default is cache-size=1M, > > exact-cache-sizes=true. > > The existing cache-size and l2-cache-size options are already documented as > maximal values. It would make sense to actually make them as such: the > caches will be as big as necessary to cover the entire image (no need for > them to be more than that) but they will be capped by the current options, > while the new default of l2-cache-size will be 32M. Why would one need > exact-cache-sizes, if they would be MIN(needed, max)? > Does it sound reasonable? I think you have a point there. There is one case where we need to cover more than the virtual disk size, and that is the VM state for internal snapshots. These are sequential accesses, though, so it doesn't matter for it much how many L2 tables we have cached. And having to reload some tables after taking or loading a snapshot sounds okay, too. So essentially everything boils down to applying max_l2_cache even if a L2 size was explicitly specified (which makes cache-size=INT64_MAX a reasonable option), changing the default from cache-size=1M to cache-size=32M (covers 256 GB with the default 64k clusters), and enabling cache-clean-interval by default. Right? Kevin
On Mon 30 Jul 2018 12:55:22 PM CEST, Kevin Wolf wrote:
> I agree with changing the defaults, I would have proposed a change
> myself soon. We have been offering cache size options for a long time,
> and management tools are still ignoring them. So we need to do
> something in QEMU.
Indeed, there's been a bit of discussion in the mailing list and here:
https://bugzilla.redhat.com/show_bug.cgi?id=1377735
There are some proposals but I haven't seen one that looks evidently
like the best choice yet.
> Now, what the exact defaults should be, is something to use a bit more
> thought on. You say "the memory overhead is very small", without going
> into details. Let's check.
Thanks for showing these numbers, Kevin. The memory overhead is
definitely not very small, and as a matter of fact a significant part of
the work that I've been doing in the past couple of years has the goal
of reducing that memory overhead, which is very significant if you have
large images and long backing chains.
> * Cap it at 32 MB (?) by default
> * Enable cache-clean-interval=30 (?) by default to compensate a bit for
> the higher maximum memory usage
I think this seems like a reasonable approach. One question that I was
asked already was "Why is cache-clean-interval not enabled by default?".
I don't think the performance impact is a problem (unless the interval
is too low of course), the only thing that I could think of is that it
could make the memory usage more unpredictable.
> Another thing I just noticed while looking at the code is that
> cache-clean-interval only considers blocks that aren't dirty, but
> doesn't take measures to get dirty blocks written out, so we depend on
> the guest flushing the cache before we can get free the memory. Should
> we attempt to write unused dirty entries back? Berto?
I never thought about it, but sounds like worth exploring.
> Maybe the right interface with this in mind would be a boolean option
> that specifies whether the given cache sizes are exact values (old
> semantics) or maximum values, which are limited to what the actual
> images size requires. If you do want the "real" full mode, you'd set
> l2-cache-size=INT64_MAX and exact-cache-sizes=false (this could use a
> better name). The new default would be l2-cache-size=32M,
> exact-cache-sizes=false. The old default is cache-size=1M,
> exact-cache-sizes=true.
That sounds quite complicated ... plus with the current ("exact values")
semantics l2-cache-size already represents a "maximum" since cache
entries are only filled on demand. That is, you can set up a 128MB L2
cache but most of that memory won't be used unless you fill it up first.
So, if you have an 8GB qcow2 image, having a 1MB L2 cache (enough for
the whole image) or a 100MB one shouldn't be very different, because in
the latter case only the first MB is going to be used in practice.
The only actual difference is the overhead of having larger data
structures (Qcow2Cache.entries and Qcow2Cache.table_array).
And thinking about this, perhaps this could be the simplest approach of
them all: let the user pass any value to l2-cache-size but then in
read_cache_sizes() do something like
if (l2_cache_size_set && *l2_cache_size > max_l2_cache) {
*l2_cache_size = max_l2_cache;
}
Then if the image is resized you can recalculate this.
This way the user can make l2-cache-size the hard maximum that they ever
are willing to use for an image's L2 cache, and QEMU guarantees that it
won't allocate that much memory if it can cover the whole image with
less.
How does that sound?
Berto
Am 03.08.2018 um 15:37 hat Alberto Garcia geschrieben:
> On Mon 30 Jul 2018 12:55:22 PM CEST, Kevin Wolf wrote:
> > I agree with changing the defaults, I would have proposed a change
> > myself soon. We have been offering cache size options for a long time,
> > and management tools are still ignoring them. So we need to do
> > something in QEMU.
>
> Indeed, there's been a bit of discussion in the mailing list and here:
>
> https://bugzilla.redhat.com/show_bug.cgi?id=1377735
>
> There are some proposals but I haven't seen one that looks evidently
> like the best choice yet.
>
> > Now, what the exact defaults should be, is something to use a bit more
> > thought on. You say "the memory overhead is very small", without going
> > into details. Let's check.
>
> Thanks for showing these numbers, Kevin. The memory overhead is
> definitely not very small, and as a matter of fact a significant part of
> the work that I've been doing in the past couple of years has the goal
> of reducing that memory overhead, which is very significant if you have
> large images and long backing chains.
The actual overhead is probably quite small in the common cases, we just
must be careful that corner cases don't completely break.
By the way, weren't you working on subclusters a while ago? How did that
go? Because I think those would enable us to use larger cluster sizes
and therefore reduce the metadata sizes as well.
> > * Cap it at 32 MB (?) by default
> > * Enable cache-clean-interval=30 (?) by default to compensate a bit for
> > the higher maximum memory usage
>
> I think this seems like a reasonable approach. One question that I was
> asked already was "Why is cache-clean-interval not enabled by default?".
>
> I don't think the performance impact is a problem (unless the interval
> is too low of course), the only thing that I could think of is that it
> could make the memory usage more unpredictable.
Yes, but I think being more unpredictable is better than being
constantly wrong. Always using more cache just because it would be more
predictable sounds silly, and using too little cache results in bad
performance. So something adaptive seems right to me.
In fact, we already have significant additonal memory use under high I/O
load because we allocate many coroutines with a 1 MB stack each. Caches
that also grow a bit under load should be bearable, I hope.
In the end, we only have a choice between bad performance for random I/O
across the whole disk or throwing more memory at the problem. Let's
choose something that performs good by default and if someone wants to
save memory, they can still set the options.
> > Another thing I just noticed while looking at the code is that
> > cache-clean-interval only considers blocks that aren't dirty, but
> > doesn't take measures to get dirty blocks written out, so we depend on
> > the guest flushing the cache before we can get free the memory. Should
> > we attempt to write unused dirty entries back? Berto?
>
> I never thought about it, but sounds like worth exploring.
Are you planning to do that or should I add it to my own todo list?
> > Maybe the right interface with this in mind would be a boolean option
> > that specifies whether the given cache sizes are exact values (old
> > semantics) or maximum values, which are limited to what the actual
> > images size requires. If you do want the "real" full mode, you'd set
> > l2-cache-size=INT64_MAX and exact-cache-sizes=false (this could use a
> > better name). The new default would be l2-cache-size=32M,
> > exact-cache-sizes=false. The old default is cache-size=1M,
> > exact-cache-sizes=true.
>
> That sounds quite complicated ... plus with the current ("exact values")
> semantics l2-cache-size already represents a "maximum" since cache
> entries are only filled on demand. That is, you can set up a 128MB L2
> cache but most of that memory won't be used unless you fill it up first.
>
> So, if you have an 8GB qcow2 image, having a 1MB L2 cache (enough for
> the whole image) or a 100MB one shouldn't be very different, because in
> the latter case only the first MB is going to be used in practice.
>
> The only actual difference is the overhead of having larger data
> structures (Qcow2Cache.entries and Qcow2Cache.table_array).
>
> And thinking about this, perhaps this could be the simplest approach of
> them all: let the user pass any value to l2-cache-size but then in
> read_cache_sizes() do something like
>
> if (l2_cache_size_set && *l2_cache_size > max_l2_cache) {
> *l2_cache_size = max_l2_cache;
> }
>
> Then if the image is resized you can recalculate this.
>
> This way the user can make l2-cache-size the hard maximum that they ever
> are willing to use for an image's L2 cache, and QEMU guarantees that it
> won't allocate that much memory if it can cover the whole image with
> less.
>
> How does that sound?
Yes. When you read the rest of the thread, you'll see that Leonid
suggested the same and I agree. And if someone really wants cover the
full image no matter how much memory it requires, they can just pass
INT64_MAX and get the right thing.
Kevin
On Fri 03 Aug 2018 04:55:42 PM CEST, Kevin Wolf wrote:
> By the way, weren't you working on subclusters a while ago? How did
> that go? Because I think those would enable us to use larger cluster
> sizes and therefore reduce the metadata sizes as well.
I had a working prototype, but the changes to both the code and the
on-disk format were not trivial. I would need to re-evaluate its
performance impact after all the changes that we have had since then,
and then see if it's worth trying again.
I suppose that the main benefit of having subclusters is that
allocations are much faster. Everything else remains more or less the
same, and in particular you can already use larger clusters if you want
to reduce the metadata sizes. Plus, with the l2-cache-entry-size option
you can already solve some of the problems of having large clusters.
>> > Another thing I just noticed while looking at the code is that
>> > cache-clean-interval only considers blocks that aren't dirty, but
>> > doesn't take measures to get dirty blocks written out, so we depend
>> > on the guest flushing the cache before we can get free the
>> > memory. Should we attempt to write unused dirty entries back?
>> > Berto?
>>
>> I never thought about it, but sounds like worth exploring.
>
> Are you planning to do that or should I add it to my own todo list?
I can do that.
>> And thinking about this, perhaps this could be the simplest approach of
>> them all: let the user pass any value to l2-cache-size but then in
>> read_cache_sizes() do something like
>>
>> if (l2_cache_size_set && *l2_cache_size > max_l2_cache) {
>> *l2_cache_size = max_l2_cache;
>> }
>>
>> Then if the image is resized you can recalculate this.
>>
>> This way the user can make l2-cache-size the hard maximum that they ever
>> are willing to use for an image's L2 cache, and QEMU guarantees that it
>> won't allocate that much memory if it can cover the whole image with
>> less.
>>
>> How does that sound?
>
> Yes. When you read the rest of the thread, you'll see that Leonid
> suggested the same and I agree. And if someone really wants cover the
> full image no matter how much memory it requires, they can just pass
> INT64_MAX and get the right thing.
I wonder why it took so long to think about this, it sounds rather
simple when you think about it...
Berto
Am 06.08.2018 um 09:47 hat Alberto Garcia geschrieben: > On Fri 03 Aug 2018 04:55:42 PM CEST, Kevin Wolf wrote: > > By the way, weren't you working on subclusters a while ago? How did > > that go? Because I think those would enable us to use larger cluster > > sizes and therefore reduce the metadata sizes as well. > > I had a working prototype, but the changes to both the code and the > on-disk format were not trivial. I would need to re-evaluate its > performance impact after all the changes that we have had since then, > and then see if it's worth trying again. > > I suppose that the main benefit of having subclusters is that > allocations are much faster. Everything else remains more or less the > same, and in particular you can already use larger clusters if you want > to reduce the metadata sizes. Plus, with the l2-cache-entry-size option > you can already solve some of the problems of having large clusters. Yes, indeed, subclusters are about making COW less painful (or getting rid of it altogether). Doing COW for full 2 MB when the guest updates 4k is just a bit over the top and I think it slows down initial writes seriously. I haven't benchmarked things in a while, though. While reasonable cache settings and potentially also avoiding fragmentation are probably more important overall, I don't think we can completely ignore initial writes. They are part of the cost of snapshots, they are what people are seeing first and also what benchmarks generally show. Kevin
On Mon 06 Aug 2018 12:45:20 PM CEST, Kevin Wolf wrote: > Am 06.08.2018 um 09:47 hat Alberto Garcia geschrieben: >> On Fri 03 Aug 2018 04:55:42 PM CEST, Kevin Wolf wrote: >> > By the way, weren't you working on subclusters a while ago? How did >> > that go? Because I think those would enable us to use larger >> > cluster sizes and therefore reduce the metadata sizes as well. >> >> I had a working prototype, but the changes to both the code and the >> on-disk format were not trivial. I would need to re-evaluate its >> performance impact after all the changes that we have had since then, >> and then see if it's worth trying again. >> >> I suppose that the main benefit of having subclusters is that >> allocations are much faster. Everything else remains more or less the >> same, and in particular you can already use larger clusters if you >> want to reduce the metadata sizes. Plus, with the l2-cache-entry-size >> option you can already solve some of the problems of having large >> clusters. > > Yes, indeed, subclusters are about making COW less painful (or getting > rid of it altogether). Doing COW for full 2 MB when the guest updates > 4k is just a bit over the top and I think it slows down initial writes > seriously. I haven't benchmarked things in a while, though. Me neither, I think the most recent ones that I have are from last year: https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg01033.html Since then we changed the cow algorithm to do only 2 operations instead of 5, so that can affect the best case scenario. > While reasonable cache settings and potentially also avoiding > fragmentation are probably more important overall, I don't think we > can completely ignore initial writes. They are part of the cost of > snapshots, they are what people are seeing first and also what > benchmarks generally show. If I have some time I could try to test the patches again on top of QEMU 3.0 and see what happens. Berto
Am 06.08.2018 um 13:07 hat Alberto Garcia geschrieben: > On Mon 06 Aug 2018 12:45:20 PM CEST, Kevin Wolf wrote: > > Am 06.08.2018 um 09:47 hat Alberto Garcia geschrieben: > >> On Fri 03 Aug 2018 04:55:42 PM CEST, Kevin Wolf wrote: > >> > By the way, weren't you working on subclusters a while ago? How did > >> > that go? Because I think those would enable us to use larger > >> > cluster sizes and therefore reduce the metadata sizes as well. > >> > >> I had a working prototype, but the changes to both the code and the > >> on-disk format were not trivial. I would need to re-evaluate its > >> performance impact after all the changes that we have had since then, > >> and then see if it's worth trying again. > >> > >> I suppose that the main benefit of having subclusters is that > >> allocations are much faster. Everything else remains more or less the > >> same, and in particular you can already use larger clusters if you > >> want to reduce the metadata sizes. Plus, with the l2-cache-entry-size > >> option you can already solve some of the problems of having large > >> clusters. > > > > Yes, indeed, subclusters are about making COW less painful (or getting > > rid of it altogether). Doing COW for full 2 MB when the guest updates > > 4k is just a bit over the top and I think it slows down initial writes > > seriously. I haven't benchmarked things in a while, though. > > Me neither, I think the most recent ones that I have are from last year: > > https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg01033.html > > Since then we changed the cow algorithm to do only 2 operations instead > of 5, so that can affect the best case scenario. I can't find the numbers from the COW request merging any more, but I remember that when I tested the approach, the result was something like this: Merging the write requests pretty much got rid of any measurable COW overhead without a backing file, but as soon as you have to read from the backing file, the saved work isn't very significant any more and the cost of the read dominates. So when you test subclusters, be sure to not only test an empty standalone qcow2 image, but also a newly taken snapshot that has to COW from the backing file a lot. > > While reasonable cache settings and potentially also avoiding > > fragmentation are probably more important overall, I don't think we > > can completely ignore initial writes. They are part of the cost of > > snapshots, they are what people are seeing first and also what > > benchmarks generally show. > > If I have some time I could try to test the patches again on top of > QEMU 3.0 and see what happens. That sounds good. Kevin
On Mon 06 Aug 2018 01:30:33 PM CEST, Kevin Wolf wrote: > Am 06.08.2018 um 13:07 hat Alberto Garcia geschrieben: >> On Mon 06 Aug 2018 12:45:20 PM CEST, Kevin Wolf wrote: >> > Am 06.08.2018 um 09:47 hat Alberto Garcia geschrieben: >> >> On Fri 03 Aug 2018 04:55:42 PM CEST, Kevin Wolf wrote: >> >> > By the way, weren't you working on subclusters a while ago? How did >> >> > that go? Because I think those would enable us to use larger >> >> > cluster sizes and therefore reduce the metadata sizes as well. >> >> >> >> I had a working prototype, but the changes to both the code and the >> >> on-disk format were not trivial. I would need to re-evaluate its >> >> performance impact after all the changes that we have had since then, >> >> and then see if it's worth trying again. >> >> >> >> I suppose that the main benefit of having subclusters is that >> >> allocations are much faster. Everything else remains more or less the >> >> same, and in particular you can already use larger clusters if you >> >> want to reduce the metadata sizes. Plus, with the l2-cache-entry-size >> >> option you can already solve some of the problems of having large >> >> clusters. >> > >> > Yes, indeed, subclusters are about making COW less painful (or getting >> > rid of it altogether). Doing COW for full 2 MB when the guest updates >> > 4k is just a bit over the top and I think it slows down initial writes >> > seriously. I haven't benchmarked things in a while, though. >> >> Me neither, I think the most recent ones that I have are from last year: >> >> https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg01033.html >> >> Since then we changed the cow algorithm to do only 2 operations instead >> of 5, so that can affect the best case scenario. > > I can't find the numbers from the COW request merging any more, but I > remember that when I tested the approach, the result was something > like this: Merging the write requests pretty much got rid of any > measurable COW overhead without a backing file, but as soon as you > have to read from the backing file, the saved work isn't very > significant any more and the cost of the read dominates. > > So when you test subclusters, be sure to not only test an empty > standalone qcow2 image, but also a newly taken snapshot that has to > COW from the backing file a lot. Yes, that's the use case that I had in mind too. Berto
Am 29.07.2018 um 23:27 hat Leonid Bloch geschrieben: > This series makes the qcow2 L2 cache cover the entire image by default. > The importance of this change is in noticeable performance improvement, > especially with heavy random I/O. The memory overhead is very small: > only 1 MB of cache for every 8 GB of image size. On systems with very > limited RAM the maximal cache size can be limited by the existing > cache-size and l2-cache-size options. > > The L2 cache is also resized accordingly, by default, if the image is > resized. Are you going to send a new version that makes the changes we agreed on? Kevin
On 08/03/2018 10:38 AM, Kevin Wolf wrote: > Am 29.07.2018 um 23:27 hat Leonid Bloch geschrieben: >> This series makes the qcow2 L2 cache cover the entire image by default. >> The importance of this change is in noticeable performance improvement, >> especially with heavy random I/O. The memory overhead is very small: >> only 1 MB of cache for every 8 GB of image size. On systems with very >> limited RAM the maximal cache size can be limited by the existing >> cache-size and l2-cache-size options. >> >> The L2 cache is also resized accordingly, by default, if the image is >> resized. > > Are you going to send a new version that makes the changes we agreed on? Yes, sure. Was somewhat busy. Will send today or tomorrow. Thanks for reminding. Leonid. > > Kevin >
© 2016 - 2025 Red Hat, Inc.