qemu-nbd.texi | 116 +++++- Makefile | 2 + include/block/nbd.h | 31 +- block/nbd-client.c | 9 +- blockdev-nbd.c | 10 +- nbd/client.c | 775 ++++++++++++++++++++++++++----------- nbd/server.c | 24 +- qemu-nbd.c | 224 ++++++++--- nbd/trace-events | 17 +- scripts/texi2pod.pl | 2 +- tests/qemu-iotests/223 | 2 + tests/qemu-iotests/223.out | 20 + tests/qemu-iotests/233 | 30 +- tests/qemu-iotests/233.out | 19 +- 14 files changed, 942 insertions(+), 339 deletions(-)
I got tired of debugging whether a server was advertising the correct things during negotiation by inspecting the trace logs of qemu-io as client - not to mention that without SOME sort of client tracing particular commands, we can't easily regression test the server for correct behavior. The final straw was at KVM Forum, when Nir asked me to make sure there was a way to easily determine if an NBD server is exposing what we really want (and fixing x-dirty-bitmap to behave saner fell out as a result of answering that question). I note that upstream NBD has 'nbd-client -l $host' for querying just export names (with no quoting, so you have to know that a blank line means the default export), but it wasn't powerful enough, so I implemented 'qemu-nbd -L' to document everything. Upstream NBD has separate 'nbd-client' and 'nbd-server' binaries, while we only have 'qemu-nbd' (which is normally just a server, but 'qemu-nbd -c' also operates a second thread as a client). Our other uses of qemu as NBD client are for consuming a block device (as in qemu-io, qemu-img, or a drive to qemu) - but those binaries are less suited to something so specific to the NBD protocol. Bonus: As a result of my work on this series, nbdkit now supports NBD_OPT_INFO (my interoperability testing between server implementations has been paying off, both at fixing server bugs, and at making this code more reliable across difference in valid servers). Also available at: https://repo.or.cz/qemu/ericb.git qemu-nbd-list-v4 Currently based on master. Since v3: - 1 new patch (1/21) - split old patch 15/19 into two (16,17/21) - retitle two patches (git backport-diff doesn't do too well at showing the diff on a retitled patch; 5,11/21) - fix review comments from Rich, Vladimir 001/21:[down] 'iotests: Make 233 output more reliable' 002/21:[----] [--] 'maint: Allow for EXAMPLES in texi2pod' 003/21:[----] [--] 'qemu-nbd: Enhance man page' 004/21:[0006] [FC] 'qemu-nbd: Sanity check partition bounds' 005/21:[down] 'nbd/server: Hoist length check to qmp_nbd_server_add' 006/21:[0025] [FC] 'nbd/server: Favor [u]int64_t over off_t' 007/21:[----] [-C] 'qemu-nbd: Avoid strtol open-coding' 008/21:[----] [--] 'nbd/client: Refactor nbd_receive_list()' 009/21:[----] [--] 'nbd/client: Move export name into NBDExportInfo' 010/21:[----] [--] 'nbd/client: Change signature of nbd_negotiate_simple_meta_context()' 011/21:[down] 'nbd/client: Split out nbd_send_meta_query()' 012/21:[0003] [FC] 'nbd/client: Split out nbd_receive_one_meta_context()' 013/21:[----] [--] 'nbd/client: Refactor return of nbd_receive_negotiate()' 014/21:[0003] [FC] 'nbd/client: Split handshake into two functions' 015/21:[----] [-C] 'nbd/client: Pull out oldstyle size determination' 016/21:[down] 'nbd/client: Refactor nbd_opt_go() to support NBD_OPT_INFO' 017/21:[0032] [FC] 'nbd/client: Add nbd_receive_export_list()' 018/21:[0006] [FC] 'nbd/client: Add meta contexts to nbd_receive_export_list()' 019/21:[0014] [FC] 'qemu-nbd: Add --list option' 020/21:[0004] [FC] 'nbd/client: Work around 3.0 bug for listing meta contexts' 021/21:[0002] [FC] 'iotests: Enhance 223, 233 to cover 'qemu-nbd --list'' Eric Blake (21): iotests: Make 233 output more reliable maint: Allow for EXAMPLES in texi2pod qemu-nbd: Enhance man page qemu-nbd: Sanity check partition bounds nbd/server: Hoist length check to qmp_nbd_server_add nbd/server: Favor [u]int64_t over off_t qemu-nbd: Avoid strtol open-coding nbd/client: Refactor nbd_receive_list() nbd/client: Move export name into NBDExportInfo nbd/client: Change signature of nbd_negotiate_simple_meta_context() nbd/client: Split out nbd_send_meta_query() nbd/client: Split out nbd_receive_one_meta_context() nbd/client: Refactor return of nbd_receive_negotiate() nbd/client: Split handshake into two functions nbd/client: Pull out oldstyle size determination nbd/client: Refactor nbd_opt_go() to support NBD_OPT_INFO nbd/client: Add nbd_receive_export_list() nbd/client: Add meta contexts to nbd_receive_export_list() qemu-nbd: Add --list option nbd/client: Work around 3.0 bug for listing meta contexts iotests: Enhance 223, 233 to cover 'qemu-nbd --list' qemu-nbd.texi | 116 +++++- Makefile | 2 + include/block/nbd.h | 31 +- block/nbd-client.c | 9 +- blockdev-nbd.c | 10 +- nbd/client.c | 775 ++++++++++++++++++++++++++----------- nbd/server.c | 24 +- qemu-nbd.c | 224 ++++++++--- nbd/trace-events | 17 +- scripts/texi2pod.pl | 2 +- tests/qemu-iotests/223 | 2 + tests/qemu-iotests/223.out | 20 + tests/qemu-iotests/233 | 30 +- tests/qemu-iotests/233.out | 19 +- 14 files changed, 942 insertions(+), 339 deletions(-) -- 2.20.1
17.01.2019 22:36, Eric Blake wrote: > I got tired of debugging whether a server was advertising the > correct things during negotiation by inspecting the trace > logs of qemu-io as client - not to mention that without SOME > sort of client tracing particular commands, we can't easily > regression test the server for correct behavior. The final > straw was at KVM Forum, when Nir asked me to make sure there > was a way to easily determine if an NBD server is exposing what > we really want (and fixing x-dirty-bitmap to behave saner fell > out as a result of answering that question). > > I note that upstream NBD has 'nbd-client -l $host' for querying > just export names (with no quoting, so you have to know that > a blank line means the default export), but it wasn't powerful > enough, so I implemented 'qemu-nbd -L' to document everything. > Upstream NBD has separate 'nbd-client' and 'nbd-server' binaries, > while we only have 'qemu-nbd' (which is normally just a server, > but 'qemu-nbd -c' also operates a second thread as a client). > Our other uses of qemu as NBD client are for consuming a block > device (as in qemu-io, qemu-img, or a drive to qemu) - but those > binaries are less suited to something so specific to the NBD > protocol. > > Bonus: As a result of my work on this series, nbdkit now supports > NBD_OPT_INFO (my interoperability testing between server > implementations has been paying off, both at fixing server bugs, > and at making this code more reliable across difference in valid > servers). > > Also available at: > https://repo.or.cz/qemu/ericb.git qemu-nbd-list-v4 > > Currently based on master. > > Since v3: > - 1 new patch (1/21) > - split old patch 15/19 into two (16,17/21) > - retitle two patches (git backport-diff doesn't do too well at showing > the diff on a retitled patch; 5,11/21) > - fix review comments from Rich, Vladimir > > 001/21:[down] 'iotests: Make 233 output more reliable' > 002/21:[----] [--] 'maint: Allow for EXAMPLES in texi2pod' > 003/21:[----] [--] 'qemu-nbd: Enhance man page' Interesting, I don't get it again. Searched in outlook online, I found only v2 of it, checked spam folder and filters. Magic. anyway, I can look at https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg04328.html -- Best regards, Vladimir
18.01.2019 11:15, Vladimir Sementsov-Ogievskiy wrote: > 17.01.2019 22:36, Eric Blake wrote: >> I got tired of debugging whether a server was advertising the >> correct things during negotiation by inspecting the trace >> logs of qemu-io as client - not to mention that without SOME >> sort of client tracing particular commands, we can't easily >> regression test the server for correct behavior. The final >> straw was at KVM Forum, when Nir asked me to make sure there >> was a way to easily determine if an NBD server is exposing what >> we really want (and fixing x-dirty-bitmap to behave saner fell >> out as a result of answering that question). >> >> I note that upstream NBD has 'nbd-client -l $host' for querying >> just export names (with no quoting, so you have to know that >> a blank line means the default export), but it wasn't powerful >> enough, so I implemented 'qemu-nbd -L' to document everything. >> Upstream NBD has separate 'nbd-client' and 'nbd-server' binaries, >> while we only have 'qemu-nbd' (which is normally just a server, >> but 'qemu-nbd -c' also operates a second thread as a client). >> Our other uses of qemu as NBD client are for consuming a block >> device (as in qemu-io, qemu-img, or a drive to qemu) - but those >> binaries are less suited to something so specific to the NBD >> protocol. >> >> Bonus: As a result of my work on this series, nbdkit now supports >> NBD_OPT_INFO (my interoperability testing between server >> implementations has been paying off, both at fixing server bugs, >> and at making this code more reliable across difference in valid >> servers). >> >> Also available at: >> https://repo.or.cz/qemu/ericb.git qemu-nbd-list-v4 >> >> Currently based on master. >> >> Since v3: >> - 1 new patch (1/21) >> - split old patch 15/19 into two (16,17/21) >> - retitle two patches (git backport-diff doesn't do too well at showing >> the diff on a retitled patch; 5,11/21) >> - fix review comments from Rich, Vladimir >> >> 001/21:[down] 'iotests: Make 233 output more reliable' >> 002/21:[----] [--] 'maint: Allow for EXAMPLES in texi2pod' >> 003/21:[----] [--] 'qemu-nbd: Enhance man page' > > Interesting, I don't get it again. Searched in outlook online, I found only v2 of it, > checked spam folder and filters. Magic. > > anyway, I can look at https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg04328.html > > just look at it in your qemu-nbd-list-v4 tag: > Only expose MBR partition @var{num}. Understands physical partitions > 1-4 and logical partitions 5-8. I'm afraid, I'm too lazy to sort out these things I don't know, so just believe. It at least corresponds to limits in code that it should be 1 <= partition <= 8 ;) > @item -c, --connect=@var{dev} > Connect @var{filename} to NBD device @var{dev} (Linux only). > @item -d, --disconnect > Disconnect the device @var{dev} (Linux only). Yes, and we now have clean code which establish this limitation > @item -e, --shared=@var{num} > Allow up to @var{num} clients to share the device (default > @samp{1}). Safe for readers, but for now, consistency is not > guaranteed between multiple writers. Hmm, don't understand, why you decided to move @samp{1}). on the following line, it looks better as it was. And with a period it's only 69 symbols, the file have a lot longer lines. > Start a server listening on port 10809 that exposes only the > guest-visible contents of a qcow2 file, with no TLS encryption, and > with the default export name (an empty string). The command is > one-shot, and will block until the first successful client > disconnects: > > @example > qemu-nbd -f qcow2 file.qcow2 > @end example > > Start a long-running server listening with encryption on port 10810, > and require clients to have a correct X.509 certificate to connect to > a 1 megabyte subset of a raw file, using the export name 'subset': > > @example > qemu-nbd \ > --object tls-creds-x509,id=tls0,endpoint=server,dir=/path/to/qemutls \ > --tls-creds tls0 -t -x subset -p 10810 \ > --image-opts driver=raw,offset=1M,length=1M,file.driver=file,file.filename=file.raw > @end example I don't know tls-related, the other options looks fine. upd (decided to run the command, with dropped tls): looks fine but not work, as s/length/size. > > Serve a read-only copy of just the first MBR partition of a guest > image over a Unix socket with as many as 5 simultaneous readers, with > a persistent process forked as a daemon: > > @example > qemu-nbd --fork -t -e 5 -s /path/to/sock -p 1 -r -f qcow2 file.qcow2 > @end example Oops. s/-s/-k/, s/-p/-P/, and idea: may be, use self-descriptive long option names in examples? > > Expose the guest-visible contents of a qcow2 file via a block device > /dev/nbd0 (and possibly creating /dev/nbd0p1 and friends for > partitions found within), then disconnect the device when done. > @emph{CAUTION}: Do not use this method to mount filesystems from an > untrusted guest image - a malicious guest may have prepared the image > to attempt to trigger kernel bugs in partition probing or file system > mounting. > > @example should we note, that nbd kernel-module should be loaded for this to work? > qemu-nbd -c /dev/nbd0 -f qcow2 file.qcow2 > qemu-nbd -d /dev/nbd0 > @end example With fixed wrong option names and s/length/size: for 03/21: Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> -- Best regards, Vladimir
On 1/18/19 7:47 AM, Vladimir Sementsov-Ogievskiy wrote: >> >> Interesting, I don't get it again. Searched in outlook online, I found only v2 of it, >> checked spam folder and filters. Magic. Weird indeed. Maybe there's enough @ signs due to .texi content in there that a spam filter somewhere along the line ate it silently? >> >> anyway, I can look at https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg04328.html >> >> > > just look at it in your qemu-nbd-list-v4 tag: Thanks for that extra effort. > > > Only expose MBR partition @var{num}. Understands physical partitions > > 1-4 and logical partitions 5-8. > > I'm afraid, I'm too lazy to sort out these things I don't know, so just believe. It at least > corresponds to limits in code that it should be 1 <= partition <= 8 ;) > > > @item -c, --connect=@var{dev} > > Connect @var{filename} to NBD device @var{dev} (Linux only). > > @item -d, --disconnect > > Disconnect the device @var{dev} (Linux only). > > Yes, and we now have clean code which establish this limitation > > > @item -e, --shared=@var{num} > > Allow up to @var{num} clients to share the device (default > > @samp{1}). Safe for readers, but for now, consistency is not > > guaranteed between multiple writers. > > Hmm, don't understand, why you decided to move @samp{1}). on the following line, it looks > better as it was. And with a period it's only 69 symbols, the file have a lot longer lines. Emacs auto-reflowed the line as I edited the paragraph. End-of-lines doesn't really matter in .texi files (it gets reflowed again in the creation of output documentation from the .texi input), so if my editor tries to stick to certain column lengths, I don't really fight it. > > @example > > qemu-nbd \ > > --object tls-creds-x509,id=tls0,endpoint=server,dir=/path/to/qemutls \ > > --tls-creds tls0 -t -x subset -p 10810 \ > > --image-opts driver=raw,offset=1M,length=1M,file.driver=file,file.filename=file.raw > > @end example > > I don't know tls-related, the other options looks fine. I looked through past git logs to see the command lines that Dan used when initially implementing things; as well as comparison to iotest 233. > > upd (decided to run the command, with dropped tls): > looks fine but not work, as s/length/size. Oh, good catch! Will fix. > > > > > Serve a read-only copy of just the first MBR partition of a guest > > image over a Unix socket with as many as 5 simultaneous readers, with > > a persistent process forked as a daemon: > > > > @example > > qemu-nbd --fork -t -e 5 -s /path/to/sock -p 1 -r -f qcow2 file.qcow2 > > @end example > > Oops. s/-s/-k/, s/-p/-P/, Ouch. Yes, thanks for catching that. and idea: may be, use self-descriptive long option names in > examples? Hmm. The nice part about short options is that they are less typing and shorter lines; but verbose options are also useful. Maybe listing the same example twice, in both the short and long form? > > > > > Expose the guest-visible contents of a qcow2 file via a block device > > /dev/nbd0 (and possibly creating /dev/nbd0p1 and friends for > > partitions found within), then disconnect the device when done. > > @emph{CAUTION}: Do not use this method to mount filesystems from an > > untrusted guest image - a malicious guest may have prepared the image > > to attempt to trigger kernel bugs in partition probing or file system > > mounting. > > > > @example > > should we note, that nbd kernel-module should be loaded for this to work? Yes, that's worth mentioning (some distros include that module to be running or autoload already, but it's not universal, so such a note is indeed warranted). > > > qemu-nbd -c /dev/nbd0 -f qcow2 file.qcow2 > > qemu-nbd -d /dev/nbd0 > > @end example > > With fixed wrong option names and s/length/size: > for 03/21: > Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> > > -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
On 1/18/19 7:47 AM, Vladimir Sementsov-Ogievskiy wrote: > > > Only expose MBR partition @var{num}. Understands physical partitions > > 1-4 and logical partitions 5-8. > > I'm afraid, I'm too lazy to sort out these things I don't know, so just believe. It at least > corresponds to limits in code that it should be 1 <= partition <= 8 ;) It matches the code, but I just learned the code is buggy for anything larger than 5. According to http://tldp.org/HOWTO/Large-Disk-HOWTO-13.html, MBR Extended/Logical partitions form a linked-list, something like: MBR: EBR1 EBR2 +----------+ +----------+ +----------+ + Part 1 + |->+ Part 5 + |->+ Part 6 + + Part 2 + | + Next --------| + 0 + + Part 3 + | + 0 + + 0 + + Part 4 ------- + 0 + + 0 + +----------+ +----------+ +----------+ In reality, ANY of part 1-4 can point to the first EBR, as long as there is at most one primary partition pointing to the extended partition - even crazier is that there are different magic numbers in historical use, such as type 5 for Part 3 and type 85 for Part 4, where MS-DOS would treat Partition 3 as the extended partition and ignore Part 4, but Linux would follow both chains (where the second extended partition thus allowed Linux access to logical partitions residing in space beyond what DOS could access). But once you start the extended partition chain, all logical partitions within the chain each have their own EBR table with one entry for the partition and the next entry pointing to the next EBR, meaning you can have more than 8 logical partitions (provided your disk is big enough). But our find_partition() code stupidly assumes that if there is any extended partition type in the MBR (recognizing only type 5 or type F, but not Linux' type 85), then all four entries in that EBR are logical partitions 5-8 (and no more, rather than a linked list chain, as in: MBR: EBR +----------+ +----------+ + Part 1 + |->+ Part 5 + + Part 2 + | + Part 6 + + Part 3 + | + Part 7 + + Part 4 ------- + Part 8 + +----------+ +----------+ It's highly unlikely that there are any BIOS implementations that would actually recognize such a partition beyond 5 as being bootable (there might be OSs which are a bit more tolerant, since MBR doesn't seem to have any one hard canonical specification). I'd have to compare what the Linux kernel MBR code does, to see if we even stand a chance of being interoperable in any manner. It doesn't help that nbdkit does not yet support Logical MBR partitions. Oh well, another project for another day; this documentation change is going in as-is because it at least matches the code, even if the code is buggy. (I'm tempted to fix nbdkit to fully support MBR logical partitions, then rip out the partitioning code in qemu as redundant as you could get it via nbdkit if you really need it - qemu-nbd's -P 8 has unchanged, and thus buggy, since at least commit 7a5ca8648 in May 2008, with no user complaining of a bug for 11 years) -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
On Fri, Jan 18, 2019 at 04:47:42PM -0600, Eric Blake wrote: > It matches the code, but I just learned the code is buggy for anything > larger than 5. According to > http://tldp.org/HOWTO/Large-Disk-HOWTO-13.html, MBR Extended/Logical > partitions form a linked-list, something like: > > MBR: EBR1 EBR2 > +----------+ +----------+ +----------+ > + Part 1 + |->+ Part 5 + |->+ Part 6 + > + Part 2 + | + Next --------| + 0 + > + Part 3 + | + 0 + + 0 + > + Part 4 ------- + 0 + + 0 + > +----------+ +----------+ +----------+ Wikipedia's description is a bit better: https://en.wikipedia.org/wiki/Extended_boot_record In fact it's not strictly a linked list because the start record of each entry is relative to the extended partition start. It's all a bit weird. But you're right, nbdkit can neither read (nbdkit-partition-filter) nor write (nbdkit-partitioning-plugin) MBR logical partitions so we can't test interop or tell people to use nbdkit for this corner case. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com Fedora Windows cross-compiler. Compile Windows programs, test, and build Windows installers. Over 100 libraries supported. http://fedoraproject.org/wiki/MinGW
Attached is a NON-working patch to nbdkit-partitioning-plugin which adds logical partition support. I don't think I've fully understood how the EBR fields are supposed to be initialized (or else they don't work how is described in online documentation). This actually causes parted to print an internal error, while fdisk gives a warning and the size of the logical partition is wrong. Anyway I've run out of time to work on it this weekend, but I may be able to have another look next week. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-builder quickly builds VMs from scratch http://libguestfs.org/virt-builder.1.html From 8d6dd4288064e9b880eccb1a6b1b7a6b03f2ba96 Mon Sep 17 00:00:00 2001 From: "Richard W.M. Jones" <rjones@redhat.com> Date: Sat, 19 Jan 2019 10:30:28 +0000 Subject: [PATCH] partitioning: Support MBR logical partitions. XXX NEEDS TESTS --- .../nbdkit-partitioning-plugin.pod | 25 +--- plugins/partitioning/virtual-disk.h | 15 +- plugins/partitioning/partition-gpt.c | 2 +- plugins/partitioning/partition-mbr.c | 134 +++++++++++++++--- plugins/partitioning/partitioning.c | 31 ++-- plugins/partitioning/virtual-disk.c | 42 +++++- 6 files changed, 184 insertions(+), 65 deletions(-) diff --git a/plugins/partitioning/nbdkit-partitioning-plugin.pod b/plugins/partitioning/nbdkit-partitioning-plugin.pod index 57a1133..edb0776 100644 --- a/plugins/partitioning/nbdkit-partitioning-plugin.pod +++ b/plugins/partitioning/nbdkit-partitioning-plugin.pod @@ -27,23 +27,12 @@ access use the I<-r> flag. =head2 Partition table type -You can choose either an MBR partition table, which is limited to 4 -partitions, or a GPT partition table. In theory GPT supports an -unlimited number of partitions. - -The rule for selecting the partition table type is: +Using the C<partition-type> parameter you can choose either an MBR or +a GPT partition table. If this parameter is I<not> present then: =over 4 -=item C<partition-type=mbr> parameter on the command line - -⇒ MBR is selected - -=item C<partition-type=gpt> parameter on the command line - -⇒ GPT is selected - -=item else, number of files E<gt> 4 +=item number of files E<gt> 4 ⇒ GPT @@ -131,7 +120,9 @@ C<./> (absolute paths do not need modification). =item B<partition-type=mbr> Add an MBR (DOS-style) partition table. The MBR format is maximally -compatible with all clients, but only supports up to 4 partitions. +compatible with all clients. If there are E<gt> 4 partitions then an +extended partition is created +(L<https://en.wikipedia.org/wiki/Extended_boot_record>). =item B<partition-type=gpt> @@ -163,10 +154,6 @@ a Linux filesystem. =head1 LIMITS -This plugin only supports B<primary> MBR partitions, hence the limit -of 4 partitions with MBR. This might be increased in future if we -implement support for logical/extended MBR partitions. - Although this plugin can create GPT partition tables containing more than 128 GPT partitions (in fact, unlimited numbers of partitions), some clients will not be able to handle this. diff --git a/plugins/partitioning/virtual-disk.h b/plugins/partitioning/virtual-disk.h index 3860f46..f59df70 100644 --- a/plugins/partitioning/virtual-disk.h +++ b/plugins/partitioning/virtual-disk.h @@ -34,6 +34,7 @@ #ifndef NBDKIT_VIRTUAL_DISK_H #define NBDKIT_VIRTUAL_DISK_H +#include <stdbool.h> #include <sys/types.h> #include <sys/stat.h> @@ -103,7 +104,7 @@ extern struct file *files; extern size_t nr_files; extern struct regions regions; -extern unsigned char *primary, *secondary; +extern unsigned char *primary, *secondary, **ebr; /* Main entry point called after files array has been populated. */ extern int create_virtual_disk_layout (void); @@ -114,16 +115,16 @@ extern int create_virtual_disk_layout (void); extern int parse_guid (const char *str, char *out) __attribute__((__nonnull__ (1, 2))); -/* Internal functions for creating MBR and GPT layouts. These are - * published here because the GPT code calls into the MBR code, but - * are not meant to be called from the main plugin. +/* Internal function for creating a single MBR PTE. The GPT code + * calls this for creating the protective MBR. */ -extern void create_mbr_partition_table (unsigned char *out) - __attribute__((__nonnull__ (1))); extern void create_mbr_partition_table_entry (const struct region *, - int bootable, int partition_id, + bool bootable, int partition_id, unsigned char *) __attribute__((__nonnull__ (1, 4))); + +/* Create MBR or GPT layout. */ +extern void create_mbr_layout (void); extern void create_gpt_layout (void); #endif /* NBDKIT_VIRTUAL_DISK_H */ diff --git a/plugins/partitioning/partition-gpt.c b/plugins/partitioning/partition-gpt.c index 5fb7602..be52e72 100644 --- a/plugins/partitioning/partition-gpt.c +++ b/plugins/partitioning/partition-gpt.c @@ -210,7 +210,7 @@ create_gpt_protective_mbr (unsigned char *out) region.end = end; region.len = region.end - region.start + 1; - create_mbr_partition_table_entry (®ion, 0, 0xee, &out[0x1be]); + create_mbr_partition_table_entry (®ion, false, 0xee, &out[0x1be]); /* Boot signature. */ out[0x1fe] = 0x55; diff --git a/plugins/partitioning/partition-mbr.c b/plugins/partitioning/partition-mbr.c index d3a0d78..6f76432 100644 --- a/plugins/partitioning/partition-mbr.c +++ b/plugins/partitioning/partition-mbr.c @@ -49,27 +49,125 @@ #include "regions.h" #include "virtual-disk.h" -/* Create the partition table. */ +static const struct region *find_file_region (size_t i, size_t *j); +static const struct region *find_ebr_region (size_t i, size_t *j); + +/* Create the MBR and optionally EBRs. */ void -create_mbr_partition_table (unsigned char *out) +create_mbr_layout (void) { - size_t i, j; - - for (j = 0; j < nr_regions (®ions); ++j) { - const struct region *region = get_region (®ions, j); - - if (region->type == region_file) { - i = region->u.i; - assert (i < 4); - create_mbr_partition_table_entry (region, i == 0, - files[i].mbr_id, - &out[0x1be + 16*i]); - } - } + size_t i, j = 0; /* Boot signature. */ - out[0x1fe] = 0x55; - out[0x1ff] = 0xaa; + primary[0x1fe] = 0x55; + primary[0x1ff] = 0xaa; + + if (nr_files <= 4) { + /* Basic MBR with no extended partition. */ + for (i = 0; i < nr_files; ++i) { + const struct region *region = find_file_region (i, &j); + + create_mbr_partition_table_entry (region, i == 0, files[i].mbr_id, + &primary[0x1be + 16*i]); + } + } + else { + struct region region; + const struct region *rptr, *eptr0, *eptr; + + /* The first three primary partitions correspond to the first + * three files. + */ + for (i = 0; i < 3; ++i) { + rptr = find_file_region (i, &j); + create_mbr_partition_table_entry (rptr, i == 0, files[i].mbr_id, + &primary[0x1be + 16*i]); + } + + /* The fourth partition is an extended PTE and does not correspond + * to any file. This partition starts with the first EBR, so find + * it. The partition extends to the end of the disk. + */ + eptr0 = find_ebr_region (3, &j); + region.start = eptr0->start; + region.end = virtual_size (®ions) - 1; /* to end of disk */ + region.len = region.end - region.start + 1; + create_mbr_partition_table_entry (®ion, false, 0xf, &primary[0x1ee]); + + /* The remaining files are mapped to logical partitions living in + * the fourth extended partition. + */ + for (i = 3; i < nr_files; ++i) { + if (i == 3) + eptr = eptr0; + else + eptr = find_ebr_region (i, &j); + rptr = find_file_region (i, &j); + + /* Signature. */ + ebr[i-3][0x1fe] = 0x55; + ebr[i-3][0x1ff] = 0xaa; + + /* First entry in EBR contains: + * offset from EBR sector to the first sector of the logical partition + * total count of sectors in the logical partition + */ + region.start = rptr->start - eptr->start; + region.len = rptr->len; + create_mbr_partition_table_entry (®ion, false, files[i].mbr_id, + &ebr[i-3][0x1be]); + + if (i < nr_files-1) { + size_t j2 = j; + const struct region *enext = find_ebr_region (i+1, &j2); + const struct region *rnext = find_file_region (i+1, &j2); + + /* Second entry in the EBR contains: + * address of next EBR relative to extended partition + * total count of sectors in the next logical partition including + * next EBR + */ + region.start = enext->start - eptr0->start; + region.len = rnext->end - enext->start + 1; + create_mbr_partition_table_entry (®ion, false, files[i+1].mbr_id, + &ebr[i-3][0x1ce]); + } + } + } +} + +/* Find the region corresponding to file[i]. + * j is a scratch register ensuring we only do a linear scan. + */ +static const struct region * +find_file_region (size_t i, size_t *j) +{ + const struct region *region; + + for (; *j < nr_regions (®ions); ++(*j)) { + region = get_region (®ions, *j); + if (region->type == region_file && region->u.i == i) + return region; + } + abort (); +} + +/* Find the region corresponding to EBR of file[i] (i >= 3). + * j is a scratch register ensuring we only do a linear scan. + */ +static const struct region * +find_ebr_region (size_t i, size_t *j) +{ + const struct region *region; + + assert (i >= 3); + + for (; *j < nr_regions (®ions); ++(*j)) { + region = get_region (®ions, *j); + if (region->type == region_data && region->u.data == ebr[i-3]) + return region; + } + abort (); } static void @@ -84,7 +182,7 @@ chs_too_large (unsigned char *out) void create_mbr_partition_table_entry (const struct region *region, - int bootable, int partition_id, + bool bootable, int partition_id, unsigned char *out) { uint64_t start_sector, nr_sectors; diff --git a/plugins/partitioning/partitioning.c b/plugins/partitioning/partitioning.c index 205c059..689cb24 100644 --- a/plugins/partitioning/partitioning.c +++ b/plugins/partitioning/partitioning.c @@ -35,6 +35,7 @@ #include <stdio.h> #include <stdlib.h> +#include <stdbool.h> #include <stdint.h> #include <inttypes.h> #include <string.h> @@ -81,8 +82,11 @@ size_t nr_files = 0; /* Virtual disk layout. */ struct regions regions; -/* Primary and secondary partition tables (secondary is only used for GPT). */ -unsigned char *primary = NULL, *secondary = NULL; +/* Primary and secondary partition tables and extended boot records. + * Secondary PT is only used for GPT. EBR array of sectors is only + * used for MBR with > 4 partitions and has length equal to nr_files-3. + */ +unsigned char *primary = NULL, *secondary = NULL, **ebr = NULL; /* Used to generate random unique partition GUIDs for GPT. */ static struct random_state random_state; @@ -105,12 +109,17 @@ partitioning_unload (void) free (files); /* We don't need to free regions.regions[].u.data because it points - * to either primary or secondary which we free here. + * to primary, secondary or ebr which we free here. */ free_regions (®ions); free (primary); free (secondary); + if (ebr) { + for (i = 0; i < nr_files-3; ++i) + free (ebr[i]); + free (ebr); + } } static int @@ -225,7 +234,7 @@ partitioning_config_complete (void) { size_t i; uint64_t total_size; - int needs_gpt; + bool needs_gpt; /* Not enough / too many files? */ if (nr_files == 0) { @@ -236,17 +245,11 @@ partitioning_config_complete (void) total_size = 0; for (i = 0; i < nr_files; ++i) total_size += files[i].statbuf.st_size; - - if (nr_files > 4) - needs_gpt = 1; - else if (total_size > MAX_MBR_DISK_SIZE) - needs_gpt = 1; - else - needs_gpt = 0; + needs_gpt = total_size > MAX_MBR_DISK_SIZE; /* Choose default parttype if not set. */ if (parttype == PARTTYPE_UNSET) { - if (needs_gpt) { + if (needs_gpt || nr_files > 4) { parttype = PARTTYPE_GPT; nbdkit_debug ("picking partition type GPT"); } @@ -256,8 +259,8 @@ partitioning_config_complete (void) } } else if (parttype == PARTTYPE_MBR && needs_gpt) { - nbdkit_error ("MBR partition table type supports a maximum of 4 partitions " - "and a maximum virtual disk size of about 2 TB, " + nbdkit_error ("MBR partition table type supports " + "a maximum virtual disk size of about 2 TB, " "but you requested %zu partition(s) " "and a total size of %" PRIu64 " bytes (> %" PRIu64 "). " "Try using: partition-type=gpt", diff --git a/plugins/partitioning/virtual-disk.c b/plugins/partitioning/virtual-disk.c index e2d72bc..4fe186e 100644 --- a/plugins/partitioning/virtual-disk.c +++ b/plugins/partitioning/virtual-disk.c @@ -69,6 +69,25 @@ create_virtual_disk_layout (void) nbdkit_error ("malloc: %m"); return -1; } + + if (nr_files > 4) { + /* The first 3 primary partitions will be real partitions, the + * 4th will be an extended partition, and so we need to store + * EBRs for nr_files-3 logical partitions. + */ + ebr = malloc (sizeof (unsigned char *) * (nr_files-3)); + if (ebr == NULL) { + nbdkit_error ("malloc: %m"); + return -1; + } + for (i = 0; i < nr_files-3; ++i) { + ebr[i] = calloc (1, SECTOR_SIZE); + if (ebr[i] == NULL) { + nbdkit_error ("malloc: %m"); + return -1; + } + } + } } else /* PARTTYPE_GPT */ { /* Protective MBR + PT header + PTA = 2 + GPT_PTA_LBAs */ @@ -117,6 +136,20 @@ create_virtual_disk_layout (void) */ assert (IS_ALIGNED (offset, SECTOR_SIZE)); + /* Logical partitions are preceeded by an EBR. */ + if (parttype == PARTTYPE_MBR && nr_files > 4 && i >= 3) { + region.start = offset; + region.len = SECTOR_SIZE; + region.end = region.start + region.len - 1; + region.type = region_data; + region.u.data = ebr[i-3]; + region.description = "EBR"; + if (append_region (®ions, region) == -1) + return -1; + + offset = virtual_size (®ions); + } + /* Make sure each partition is aligned for best performance. */ if (!IS_ALIGNED (offset, files[i].alignment)) { region.start = offset; @@ -207,13 +240,10 @@ create_partition_table (void) if (parttype == PARTTYPE_GPT) assert (secondary != NULL); - if (parttype == PARTTYPE_MBR) { - assert (nr_files <= 4); - create_mbr_partition_table (primary); - } - else /* parttype == PARTTYPE_GPT */ { + if (parttype == PARTTYPE_MBR) + create_mbr_layout (); + else /* parttype == PARTTYPE_GPT */ create_gpt_layout (); - } return 0; } -- 2.20.1
On Sat, Jan 19, 2019 at 11:39:41AM +0000, Richard W.M. Jones wrote: > > Attached is a NON-working patch to nbdkit-partitioning-plugin which > adds logical partition support. I don't think I've fully understood > how the EBR fields are supposed to be initialized (or else they don't > work how is described in online documentation). This actually causes > parted to print an internal error, while fdisk gives a warning and the > size of the logical partition is wrong. > > Anyway I've run out of time to work on it this weekend, but I may be > able to have another look next week. Sheesh. Apparently the type byte in the link field must be the extended partition type byte (eg. 0x5 or 0xf), NOT the type of the next partition. This fixes it: $ git diff diff --git a/plugins/partitioning/partition-mbr.c b/plugins/partitioning/partition-mbr.c index 6f76432..6b256d1 100644 --- a/plugins/partitioning/partition-mbr.c +++ b/plugins/partitioning/partition-mbr.c @@ -129,7 +129,7 @@ create_mbr_layout (void) */ region.start = enext->start - eptr0->start; region.len = rnext->end - enext->start + 1; - create_mbr_partition_table_entry (®ion, false, files[i+1].mbr_id, + create_mbr_partition_table_entry (®ion, false, 0xf, &ebr[i-3][0x1ce]); } } I'll try to come up with a better patch with tests etc. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com libguestfs lets you edit virtual machines. Supports shell scripting, bindings from many languages. http://libguestfs.org
On 1/17/19 1:36 PM, Eric Blake wrote: > I got tired of debugging whether a server was advertising the > correct things during negotiation by inspecting the trace > logs of qemu-io as client - not to mention that without SOME > sort of client tracing particular commands, we can't easily > regression test the server for correct behavior. The final > straw was at KVM Forum, when Nir asked me to make sure there > was a way to easily determine if an NBD server is exposing what > we really want (and fixing x-dirty-bitmap to behave saner fell > out as a result of answering that question). > > I note that upstream NBD has 'nbd-client -l $host' for querying > just export names (with no quoting, so you have to know that > a blank line means the default export), but it wasn't powerful > enough, so I implemented 'qemu-nbd -L' to document everything. > Upstream NBD has separate 'nbd-client' and 'nbd-server' binaries, > while we only have 'qemu-nbd' (which is normally just a server, > but 'qemu-nbd -c' also operates a second thread as a client). > Our other uses of qemu as NBD client are for consuming a block > device (as in qemu-io, qemu-img, or a drive to qemu) - but those > binaries are less suited to something so specific to the NBD > protocol. > > Bonus: As a result of my work on this series, nbdkit now supports > NBD_OPT_INFO (my interoperability testing between server > implementations has been paying off, both at fixing server bugs, > and at making this code more reliable across difference in valid > servers). > > Also available at: > https://repo.or.cz/qemu/ericb.git qemu-nbd-list-v4 Vladimir spotted a few more tweaks to make, but they all look sufficiently minor that I'll go ahead and queue this on my NBD tree for a pull request on Monday, rather than posting a v5, if there aren't any other major comments in the meantime. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
© 2016 - 2024 Red Hat, Inc.