include/migration/migration.h | 11 ++ linux-headers/linux/userfaultfd.h | 1 + migration/migration.c | 238 +++++++++++++++++++++++++++++++++++++- migration/postcopy-ram.c | 61 +++++++++- migration/savevm.c | 2 + migration/trace-events | 10 +- 6 files changed, 319 insertions(+), 4 deletions(-)
Hi David, I already asked you about downtime calculation for postcopy live migration. As I remember you said it's worth not to calculate it per vCPU or maybe I understood you incorrectly. I decided to proof it could be useful. This patch set is based on commit 272d7dee5951f926fad1911f2f072e5915cdcba0 of QEMU master branch. It requires commit into Andreas git repository "userfaultfd: provide pid in userfault uffd_msg" When I tested it I found following moments are strange: 1. First userfault always occurs due to access to ram in vapic_map_rom_writable, all vCPU are sleeping in this time 2. Latest half of all userfault was initiated by kworkers, that's why I had a doubt about current in handle_userfault inside kernel as a proper task_struct for pagefault initiator. All vCPU was sleeping at that moment. 3. Also there is a discrepancy, of vCPU state and real vCPU thread state. This patch is just for showing and idea, if you ok with this idea none RFC patch will not include proc access && a lot of traces. Also I think it worth to guard postcopy_downtime in MigrationIncomingState and return calculated downtime into src, where qeury-migration will be invocked. Alexey Perevalov (2): userfault: add pid into uffd_msg migration: calculate downtime on dst side include/migration/migration.h | 11 ++ linux-headers/linux/userfaultfd.h | 1 + migration/migration.c | 238 +++++++++++++++++++++++++++++++++++++- migration/postcopy-ram.c | 61 +++++++++- migration/savevm.c | 2 + migration/trace-events | 10 +- 6 files changed, 319 insertions(+), 4 deletions(-) -- 1.8.3.1
Hi, This series seems to have some coding style problems. See output below for more information: Message-id: 1489850003-5652-1-git-send-email-a.perevalov@samsung.com Subject: [Qemu-devel] [RFC PATCH 0/2] Calcuate downtime for postcopy live migration Type: series === TEST SCRIPT BEGIN === #!/bin/bash BASE=base n=1 total=$(git log --oneline $BASE.. | wc -l) failed=0 # Useful git options git config --local diff.renamelimit 0 git config --local diff.renames True commits="$(git log --format=%H --reverse $BASE..)" for c in $commits; do echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..." if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then failed=1 echo fi n=$((n+1)) done exit $failed === TEST SCRIPT END === Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384 From https://github.com/patchew-project/qemu * [new tag] patchew/1489850003-5652-1-git-send-email-a.perevalov@samsung.com -> patchew/1489850003-5652-1-git-send-email-a.perevalov@samsung.com Switched to a new branch 'test' 2bb4f0b migration: calculate downtime on dst side f39835b userfault: add pid into uffd_msg === OUTPUT BEGIN === Checking PATCH 1/2: userfault: add pid into uffd_msg... Checking PATCH 2/2: migration: calculate downtime on dst side... Argument "m" isn't numeric in numeric eq (==) at ./scripts/checkpatch.pl line 2459. Argument "m" isn't numeric in numeric eq (==) at ./scripts/checkpatch.pl line 2459. Use of uninitialized value $1 in concatenation (.) or string at ./scripts/checkpatch.pl line 2460. ERROR: braces {} are necessary for all arms of this statement #68: FILE: migration/migration.c:129: + if (a == b) [...] + else if (a > b) [...] ERROR: braces {} are necessary for all arms of this statement #70: FILE: migration/migration.c:131: + else if (a > b) [...] ERROR: unnecessary cast may hide bugs, use g_new0 instead #131: FILE: migration/migration.c:2149: + dd = (DowntimeDuration *)g_malloc0(sizeof(DowntimeDuration)); ERROR: braces {} are necessary for all arms of this statement #135: FILE: migration/migration.c:2153: + if (cpu < 0) [...] + else [...] WARNING: line over 80 characters #160: FILE: migration/migration.c:2178: + /* error_report("Could not populate downtime duration completion time \n\ ERROR: unnecessary whitespace before a quoted newline #160: FILE: migration/migration.c:2178: + /* error_report("Could not populate downtime duration completion time \n\ ERROR: Error messages should not contain newlines #160: FILE: migration/migration.c:2178: + /* error_report("Could not populate downtime duration completion time \n\ ERROR: braces {} are necessary for all arms of this statement #193: FILE: migration/migration.c:2211: + if (dd->end && dd->begin) [...] WARNING: line over 80 characters #194: FILE: migration/migration.c:2212: + trace_sumup_downtime_duration(dd->end - dd->begin, (uint64_t)key, dd->cpus); ERROR: braces {} are necessary for all arms of this statement #205: FILE: migration/migration.c:2223: + if (test_bit(cpu_iter, &dd->cpus) && dd->end && dd->begin) [...] ERROR: that open brace { should be on the previous line #222: FILE: migration/migration.c:2240: + for (i = 0; i < smp_cpus; i++) + { ERROR: braces {} are necessary even for single statement blocks #222: FILE: migration/migration.c:2240: + for (i = 0; i < smp_cpus; i++) + { + set_bit(i, &sufficient_cpus); + } ERROR: code indent should never use tabs #224: FILE: migration/migration.c:2242: +^Iset_bit(i, &sufficient_cpus);$ WARNING: line over 80 characters #262: FILE: migration/migration.c:2280: + * S2,E1 - doesn't match condition due to sequence S1,S2,E1 doesn't include CPU3, ERROR: that open brace { should be on the previous line #288: FILE: migration/migration.c:2306: + for (point_iter = 0; point_iter < smp_cpus; point_iter++) + { ERROR: braces {} are necessary even for single statement blocks #288: FILE: migration/migration.c:2306: + for (point_iter = 0; point_iter < smp_cpus; point_iter++) + { + trace_downtime_per_cpu(point_iter, downtime_cpu[point_iter]); + } ERROR: braces {} are necessary for all arms of this statement #302: FILE: migration/migration.c:2320: + if (!od->is_end || prev_od->is_end) [...] ERROR: braces {} are necessary for all arms of this statement #311: FILE: migration/migration.c:2329: + if (t_od->is_end) [...] ERROR: braces {} are necessary for all arms of this statement #367: FILE: migration/postcopy-ram.c:431: + if (strstr(line, "Name")) [...] ERROR: braces {} are necessary for all arms of this statement #369: FILE: migration/postcopy-ram.c:433: + if (strstr(line, "State")) [...] ERROR: suspect code indent for conditional statements (8, 11) #382: FILE: migration/postcopy-ram.c:446: + if (cpu_iter->thread_id == pid) + return cpu_iter->cpu_index; ERROR: braces {} are necessary for all arms of this statement #382: FILE: migration/postcopy-ram.c:446: + if (cpu_iter->thread_id == pid) [...] WARNING: line over 80 characters #414: FILE: migration/postcopy-ram.c:540: + rb_offset, msg.arg.pagefault.pid); ERROR: code indent should never use tabs #416: FILE: migration/postcopy-ram.c:542: +^Imark_postcopy_downtime_begin(msg.arg.pagefault.address,$ ERROR: code indent should never use tabs #417: FILE: migration/postcopy-ram.c:543: +^I^Idefined_mem_fault_cpu_index(msg.arg.pagefault.pid));$ ERROR: code indent should never use tabs #437: FILE: migration/savevm.c:1633: +^Itrace_loadvm_postcopy_vm_start(get_postcopy_total_downtime());$ total: 22 errors, 4 warnings, 435 lines checked Your patch has style problems, please review. If any of these errors are false positives report them to the maintainer, see CHECKPATCH in MAINTAINERS. === OUTPUT END === Test command exited with code: 1 --- Email generated automatically by Patchew [http://patchew.org/]. Please send your feedback to patchew-devel@freelists.org
* Alexey Perevalov (a.perevalov@samsung.com) wrote: > Hi David, > > I already asked you about downtime calculation for postcopy live migration. > As I remember you said it's worth not to calculate it per vCPU or maybe I > understood you incorrectly. I decided to proof it could be useful. Thanks - apologies for taking so long to look at it. Some higher level thoughts: a) It needs to be switchable - the tree etc look like they could use a fair amount of RAM. b) The cpu bitmask is a problem given we can have more than 64 CPUs c) Tracing the pages that took the longest can be interesting - I've done graphs of latencies before - you get fun things like watching messes where you lose requests and the page eventually arrives anyway after a few seconds. > This patch set is based on commit 272d7dee5951f926fad1911f2f072e5915cdcba0 > of QEMU master branch. It requires commit into Andreas git repository > "userfaultfd: provide pid in userfault uffd_msg" > > When I tested it I found following moments are strange: > 1. First userfault always occurs due to access to ram in vapic_map_rom_writable, > all vCPU are sleeping in this time That's probably not too surprising - I bet the vapic device load code does that? I've sometimes wondered about preloading the queue on the source with some that we know will need to be loaded early. > 2. Latest half of all userfault was initiated by kworkers, that's why I had a doubt > about current in handle_userfault inside kernel as a proper task_struct for pagefault > initiator. All vCPU was sleeping at that moment. When you say kworkers - which ones? I wonder what they are - perhaps incoming network packets using vhost? > 3. Also there is a discrepancy, of vCPU state and real vCPU thread state. What do you mean by that? > This patch is just for showing and idea, if you ok with this idea none RFC patch will not > include proc access && a lot of traces. > Also I think it worth to guard postcopy_downtime in MigrationIncomingState and > return calculated downtime into src, where qeury-migration will be invocked. I don't think it's worth it, we can always ask the destination and sending stuff back to the source is probably messy - especially at the end. Dave > Alexey Perevalov (2): > userfault: add pid into uffd_msg > migration: calculate downtime on dst side > > include/migration/migration.h | 11 ++ > linux-headers/linux/userfaultfd.h | 1 + > migration/migration.c | 238 +++++++++++++++++++++++++++++++++++++- > migration/postcopy-ram.c | 61 +++++++++- > migration/savevm.c | 2 + > migration/trace-events | 10 +- > 6 files changed, 319 insertions(+), 4 deletions(-) > > -- > 1.8.3.1 > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
On 04/04/2017 10:06 PM, Dr. David Alan Gilbert wrote: > * Alexey Perevalov (a.perevalov@samsung.com) wrote: >> Hi David, >> >> I already asked you about downtime calculation for postcopy live migration. >> As I remember you said it's worth not to calculate it per vCPU or maybe I >> understood you incorrectly. I decided to proof it could be useful. > Thanks - apologies for taking so long to look at it. > Some higher level thoughts: > a) It needs to be switchable - the tree etc look like they could use a fair > amount of RAM. Are you worry about memory overhead when downtime will be calculated? I chose tree due to lookup performance, but as an alternative it could be a hash table. The tree population is on hot path, but the final calculation is not. Do you want to enable it by demand, as capability (such as compress)? > b) The cpu bitmask is a problem given we can have more than 64 CPUs Here I agree it's not so scalable, I thought about straightforward char/bool array, or invent bit operation for memory region. > c) Tracing the pages that took the longest can be interesting - I've done > graphs of latencies before - you get fun things like watching messes > where you lose requests and the page eventually arrives anyway after > a few seconds. > >> This patch set is based on commit 272d7dee5951f926fad1911f2f072e5915cdcba0 >> of QEMU master branch. It requires commit into Andreas git repository >> "userfaultfd: provide pid in userfault uffd_msg" >> >> When I tested it I found following moments are strange: >> 1. First userfault always occurs due to access to ram in vapic_map_rom_writable, >> all vCPU are sleeping in this time > That's probably not too surprising - I bet the vapic device load code does that? > I've sometimes wondered about preloading the queue on the source with some that we know > will need to be loaded early. Yes, it's vapic device initialization, do your mean earlier than discard ram blocks? I think vapic configuration on destination is the same as on source machine, so maybe it's not necessary to request it and wait. > >> 2. Latest half of all userfault was initiated by kworkers, that's why I had a doubt >> about current in handle_userfault inside kernel as a proper task_struct for pagefault >> initiator. All vCPU was sleeping at that moment. > When you say kworkers - which ones? I wonder what they are - perhaps incoming network > packets using vhost? No, in that scenario I used tap network device. Unfortunately, I didn't track down who it was, but I will. > >> 3. Also there is a discrepancy, of vCPU state and real vCPU thread state. > What do you mean by that? I mean /proc's status and internal qemu's vCPU status for pagefaulted vCPU thread. > >> This patch is just for showing and idea, if you ok with this idea none RFC patch will not >> include proc access && a lot of traces. >> Also I think it worth to guard postcopy_downtime in MigrationIncomingState and >> return calculated downtime into src, where qeury-migration will be invocked. > I don't think it's worth it, we can always ask the destination and sending stuff > back to the source is probably messy - especially at the end. > > Dave > >> Alexey Perevalov (2): >> userfault: add pid into uffd_msg >> migration: calculate downtime on dst side >> >> include/migration/migration.h | 11 ++ >> linux-headers/linux/userfaultfd.h | 1 + >> migration/migration.c | 238 +++++++++++++++++++++++++++++++++++++- >> migration/postcopy-ram.c | 61 +++++++++- >> migration/savevm.c | 2 + >> migration/trace-events | 10 +- >> 6 files changed, 319 insertions(+), 4 deletions(-) >> >> -- >> 1.8.3.1 >> > -- > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK > > > -- Best regards, Alexey Perevalov
© 2016 - 2024 Red Hat, Inc.