include/migration/migration.h | 11 ++ linux-headers/linux/userfaultfd.h | 1 + migration/migration.c | 238 +++++++++++++++++++++++++++++++++++++- migration/postcopy-ram.c | 61 +++++++++- migration/savevm.c | 2 + migration/trace-events | 10 +- 6 files changed, 319 insertions(+), 4 deletions(-)
Hi David, I already asked you about downtime calculation for postcopy live migration. As I remember you said it's worth not to calculate it per vCPU or maybe I understood you incorrectly. I decided to proof it could be useful. This patch set is based on commit 272d7dee5951f926fad1911f2f072e5915cdcba0 of QEMU master branch. It requires commit into Andreas git repository "userfaultfd: provide pid in userfault uffd_msg" When I tested it I found following moments are strange: 1. First userfault always occurs due to access to ram in vapic_map_rom_writable, all vCPU are sleeping in this time 2. Latest half of all userfault was initiated by kworkers, that's why I had a doubt about current in handle_userfault inside kernel as a proper task_struct for pagefault initiator. All vCPU was sleeping at that moment. 3. Also there is a discrepancy, of vCPU state and real vCPU thread state. This patch is just for showing and idea, if you ok with this idea none RFC patch will not include proc access && a lot of traces. Also I think it worth to guard postcopy_downtime in MigrationIncomingState and return calculated downtime into src, where qeury-migration will be invocked. Alexey Perevalov (2): userfault: add pid into uffd_msg migration: calculate downtime on dst side include/migration/migration.h | 11 ++ linux-headers/linux/userfaultfd.h | 1 + migration/migration.c | 238 +++++++++++++++++++++++++++++++++++++- migration/postcopy-ram.c | 61 +++++++++- migration/savevm.c | 2 + migration/trace-events | 10 +- 6 files changed, 319 insertions(+), 4 deletions(-) -- 1.8.3.1
Hi,
This series seems to have some coding style problems. See output below for
more information:
Message-id: 1489850003-5652-1-git-send-email-a.perevalov@samsung.com
Subject: [Qemu-devel] [RFC PATCH 0/2] Calcuate downtime for postcopy live migration
Type: series
=== TEST SCRIPT BEGIN ===
#!/bin/bash
BASE=base
n=1
total=$(git log --oneline $BASE.. | wc -l)
failed=0
# Useful git options
git config --local diff.renamelimit 0
git config --local diff.renames True
commits="$(git log --format=%H --reverse $BASE..)"
for c in $commits; do
echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..."
if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
failed=1
echo
fi
n=$((n+1))
done
exit $failed
=== TEST SCRIPT END ===
Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
* [new tag] patchew/1489850003-5652-1-git-send-email-a.perevalov@samsung.com -> patchew/1489850003-5652-1-git-send-email-a.perevalov@samsung.com
Switched to a new branch 'test'
2bb4f0b migration: calculate downtime on dst side
f39835b userfault: add pid into uffd_msg
=== OUTPUT BEGIN ===
Checking PATCH 1/2: userfault: add pid into uffd_msg...
Checking PATCH 2/2: migration: calculate downtime on dst side...
Argument "m" isn't numeric in numeric eq (==) at ./scripts/checkpatch.pl line 2459.
Argument "m" isn't numeric in numeric eq (==) at ./scripts/checkpatch.pl line 2459.
Use of uninitialized value $1 in concatenation (.) or string at ./scripts/checkpatch.pl line 2460.
ERROR: braces {} are necessary for all arms of this statement
#68: FILE: migration/migration.c:129:
+ if (a == b)
[...]
+ else if (a > b)
[...]
ERROR: braces {} are necessary for all arms of this statement
#70: FILE: migration/migration.c:131:
+ else if (a > b)
[...]
ERROR: unnecessary cast may hide bugs, use g_new0 instead
#131: FILE: migration/migration.c:2149:
+ dd = (DowntimeDuration *)g_malloc0(sizeof(DowntimeDuration));
ERROR: braces {} are necessary for all arms of this statement
#135: FILE: migration/migration.c:2153:
+ if (cpu < 0)
[...]
+ else
[...]
WARNING: line over 80 characters
#160: FILE: migration/migration.c:2178:
+ /* error_report("Could not populate downtime duration completion time \n\
ERROR: unnecessary whitespace before a quoted newline
#160: FILE: migration/migration.c:2178:
+ /* error_report("Could not populate downtime duration completion time \n\
ERROR: Error messages should not contain newlines
#160: FILE: migration/migration.c:2178:
+ /* error_report("Could not populate downtime duration completion time \n\
ERROR: braces {} are necessary for all arms of this statement
#193: FILE: migration/migration.c:2211:
+ if (dd->end && dd->begin)
[...]
WARNING: line over 80 characters
#194: FILE: migration/migration.c:2212:
+ trace_sumup_downtime_duration(dd->end - dd->begin, (uint64_t)key, dd->cpus);
ERROR: braces {} are necessary for all arms of this statement
#205: FILE: migration/migration.c:2223:
+ if (test_bit(cpu_iter, &dd->cpus) && dd->end && dd->begin)
[...]
ERROR: that open brace { should be on the previous line
#222: FILE: migration/migration.c:2240:
+ for (i = 0; i < smp_cpus; i++)
+ {
ERROR: braces {} are necessary even for single statement blocks
#222: FILE: migration/migration.c:2240:
+ for (i = 0; i < smp_cpus; i++)
+ {
+ set_bit(i, &sufficient_cpus);
+ }
ERROR: code indent should never use tabs
#224: FILE: migration/migration.c:2242:
+^Iset_bit(i, &sufficient_cpus);$
WARNING: line over 80 characters
#262: FILE: migration/migration.c:2280:
+ * S2,E1 - doesn't match condition due to sequence S1,S2,E1 doesn't include CPU3,
ERROR: that open brace { should be on the previous line
#288: FILE: migration/migration.c:2306:
+ for (point_iter = 0; point_iter < smp_cpus; point_iter++)
+ {
ERROR: braces {} are necessary even for single statement blocks
#288: FILE: migration/migration.c:2306:
+ for (point_iter = 0; point_iter < smp_cpus; point_iter++)
+ {
+ trace_downtime_per_cpu(point_iter, downtime_cpu[point_iter]);
+ }
ERROR: braces {} are necessary for all arms of this statement
#302: FILE: migration/migration.c:2320:
+ if (!od->is_end || prev_od->is_end)
[...]
ERROR: braces {} are necessary for all arms of this statement
#311: FILE: migration/migration.c:2329:
+ if (t_od->is_end)
[...]
ERROR: braces {} are necessary for all arms of this statement
#367: FILE: migration/postcopy-ram.c:431:
+ if (strstr(line, "Name"))
[...]
ERROR: braces {} are necessary for all arms of this statement
#369: FILE: migration/postcopy-ram.c:433:
+ if (strstr(line, "State"))
[...]
ERROR: suspect code indent for conditional statements (8, 11)
#382: FILE: migration/postcopy-ram.c:446:
+ if (cpu_iter->thread_id == pid)
+ return cpu_iter->cpu_index;
ERROR: braces {} are necessary for all arms of this statement
#382: FILE: migration/postcopy-ram.c:446:
+ if (cpu_iter->thread_id == pid)
[...]
WARNING: line over 80 characters
#414: FILE: migration/postcopy-ram.c:540:
+ rb_offset, msg.arg.pagefault.pid);
ERROR: code indent should never use tabs
#416: FILE: migration/postcopy-ram.c:542:
+^Imark_postcopy_downtime_begin(msg.arg.pagefault.address,$
ERROR: code indent should never use tabs
#417: FILE: migration/postcopy-ram.c:543:
+^I^Idefined_mem_fault_cpu_index(msg.arg.pagefault.pid));$
ERROR: code indent should never use tabs
#437: FILE: migration/savevm.c:1633:
+^Itrace_loadvm_postcopy_vm_start(get_postcopy_total_downtime());$
total: 22 errors, 4 warnings, 435 lines checked
Your patch has style problems, please review. If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
=== OUTPUT END ===
Test command exited with code: 1
---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-devel@freelists.org
* Alexey Perevalov (a.perevalov@samsung.com) wrote:
> Hi David,
>
> I already asked you about downtime calculation for postcopy live migration.
> As I remember you said it's worth not to calculate it per vCPU or maybe I
> understood you incorrectly. I decided to proof it could be useful.
Thanks - apologies for taking so long to look at it.
Some higher level thoughts:
a) It needs to be switchable - the tree etc look like they could use a fair
amount of RAM.
b) The cpu bitmask is a problem given we can have more than 64 CPUs
c) Tracing the pages that took the longest can be interesting - I've done
graphs of latencies before - you get fun things like watching messes
where you lose requests and the page eventually arrives anyway after
a few seconds.
> This patch set is based on commit 272d7dee5951f926fad1911f2f072e5915cdcba0
> of QEMU master branch. It requires commit into Andreas git repository
> "userfaultfd: provide pid in userfault uffd_msg"
>
> When I tested it I found following moments are strange:
> 1. First userfault always occurs due to access to ram in vapic_map_rom_writable,
> all vCPU are sleeping in this time
That's probably not too surprising - I bet the vapic device load code does that?
I've sometimes wondered about preloading the queue on the source with some that we know
will need to be loaded early.
> 2. Latest half of all userfault was initiated by kworkers, that's why I had a doubt
> about current in handle_userfault inside kernel as a proper task_struct for pagefault
> initiator. All vCPU was sleeping at that moment.
When you say kworkers - which ones? I wonder what they are - perhaps incoming network
packets using vhost?
> 3. Also there is a discrepancy, of vCPU state and real vCPU thread state.
What do you mean by that?
> This patch is just for showing and idea, if you ok with this idea none RFC patch will not
> include proc access && a lot of traces.
> Also I think it worth to guard postcopy_downtime in MigrationIncomingState and
> return calculated downtime into src, where qeury-migration will be invocked.
I don't think it's worth it, we can always ask the destination and sending stuff
back to the source is probably messy - especially at the end.
Dave
> Alexey Perevalov (2):
> userfault: add pid into uffd_msg
> migration: calculate downtime on dst side
>
> include/migration/migration.h | 11 ++
> linux-headers/linux/userfaultfd.h | 1 +
> migration/migration.c | 238 +++++++++++++++++++++++++++++++++++++-
> migration/postcopy-ram.c | 61 +++++++++-
> migration/savevm.c | 2 +
> migration/trace-events | 10 +-
> 6 files changed, 319 insertions(+), 4 deletions(-)
>
> --
> 1.8.3.1
>
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
On 04/04/2017 10:06 PM, Dr. David Alan Gilbert wrote: > * Alexey Perevalov (a.perevalov@samsung.com) wrote: >> Hi David, >> >> I already asked you about downtime calculation for postcopy live migration. >> As I remember you said it's worth not to calculate it per vCPU or maybe I >> understood you incorrectly. I decided to proof it could be useful. > Thanks - apologies for taking so long to look at it. > Some higher level thoughts: > a) It needs to be switchable - the tree etc look like they could use a fair > amount of RAM. Are you worry about memory overhead when downtime will be calculated? I chose tree due to lookup performance, but as an alternative it could be a hash table. The tree population is on hot path, but the final calculation is not. Do you want to enable it by demand, as capability (such as compress)? > b) The cpu bitmask is a problem given we can have more than 64 CPUs Here I agree it's not so scalable, I thought about straightforward char/bool array, or invent bit operation for memory region. > c) Tracing the pages that took the longest can be interesting - I've done > graphs of latencies before - you get fun things like watching messes > where you lose requests and the page eventually arrives anyway after > a few seconds. > >> This patch set is based on commit 272d7dee5951f926fad1911f2f072e5915cdcba0 >> of QEMU master branch. It requires commit into Andreas git repository >> "userfaultfd: provide pid in userfault uffd_msg" >> >> When I tested it I found following moments are strange: >> 1. First userfault always occurs due to access to ram in vapic_map_rom_writable, >> all vCPU are sleeping in this time > That's probably not too surprising - I bet the vapic device load code does that? > I've sometimes wondered about preloading the queue on the source with some that we know > will need to be loaded early. Yes, it's vapic device initialization, do your mean earlier than discard ram blocks? I think vapic configuration on destination is the same as on source machine, so maybe it's not necessary to request it and wait. > >> 2. Latest half of all userfault was initiated by kworkers, that's why I had a doubt >> about current in handle_userfault inside kernel as a proper task_struct for pagefault >> initiator. All vCPU was sleeping at that moment. > When you say kworkers - which ones? I wonder what they are - perhaps incoming network > packets using vhost? No, in that scenario I used tap network device. Unfortunately, I didn't track down who it was, but I will. > >> 3. Also there is a discrepancy, of vCPU state and real vCPU thread state. > What do you mean by that? I mean /proc's status and internal qemu's vCPU status for pagefaulted vCPU thread. > >> This patch is just for showing and idea, if you ok with this idea none RFC patch will not >> include proc access && a lot of traces. >> Also I think it worth to guard postcopy_downtime in MigrationIncomingState and >> return calculated downtime into src, where qeury-migration will be invocked. > I don't think it's worth it, we can always ask the destination and sending stuff > back to the source is probably messy - especially at the end. > > Dave > >> Alexey Perevalov (2): >> userfault: add pid into uffd_msg >> migration: calculate downtime on dst side >> >> include/migration/migration.h | 11 ++ >> linux-headers/linux/userfaultfd.h | 1 + >> migration/migration.c | 238 +++++++++++++++++++++++++++++++++++++- >> migration/postcopy-ram.c | 61 +++++++++- >> migration/savevm.c | 2 + >> migration/trace-events | 10 +- >> 6 files changed, 319 insertions(+), 4 deletions(-) >> >> -- >> 1.8.3.1 >> > -- > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK > > > -- Best regards, Alexey Perevalov
© 2016 - 2026 Red Hat, Inc.