On 9/21/22 1:44 PM, Jonathon Jongsma wrote:
> On 9/19/22 8:48 AM, Peter Krempa wrote:
>> On Wed, Aug 31, 2022 at 13:40:45 -0500, Jonathon Jongsma wrote:
>>> After a bit of a lengthy delay, this is the second version of this patch
>>> series. See https://bugzilla.redhat.com/show_bug.cgi?id=2016527 for more
>>> information about the goal, but the summary is that RHEL does not
>>> want to ship
>>> the qemu storage plugins for curl and ssh. Handling them outside of
>>> the qemu
>>> process provides several advantages such as reduced attack surface and
>>> stability.
>>
>> IMO it's also worthy noting that it increases complexity of the setup
>> and potentially also resource usage.
>>
>>> A quick summary of the code:
>>
>> [...]
>>
>>> Open questions
>>> - selinux: I need some help from people more familiar with selinux
>>> to figure
>>> out what is needed here. When selinux is enforcing, I get a
>>> failure to
>>> launch nbdkit to serve the disks. I suspect we need a new context
>>> and policy
>>> for /usr/sbin/nbdkit that allows it to transition to the
>>> appropriate selinux
>>> context. The current context (on fedora) is
>>> "system_u:object_r:bin_t:s0".
>>> When I (temporarily) change the context to something like
>>> qemu_exec_t,
>>> I am able to start nbdkit and the domain launches.
>>
>> Is the problem with starting the 'nbdkit' process itself or with the
>> socket?
>
> The problem seems to be with the nbdkit process itself. Because I use
> qemuSecurityCommandRun() to launch nbdkit, it sets the security label
> for the nbkit process to be the security label for the domain and then
> attempts to launch it. Presumably there is no rule allowing the binaries
> with the bin_t context to transition to the label for the domain. But as
> I said, my selinux knowledge is quite shallow, so I may need some help
> figuring out the proper way to do this.
Just for completeness, here's some additional information from
SETroubleshoot when trying to start a domain using nbdkit with my
patches. In this case I'm just executing the monolithic libvirtd daemon
from my working directory for testing. The domain fails to start :
SELinux is preventing rpc-libvirtd from entrypoint access on the file
/usr/sbin/nbdkit.
Additional Information:
Source Context unconfined_u:unconfined_r:svirt_t:s0:c129,c164
Target Context system_u:object_r:bin_t:s0
Target Objects /usr/sbin/nbdkit [ file ]
Source rpc-libvirtd
Source Path rpc-libvirtd
Port <Unknown>
Raw Audit Messages
type=AVC msg=audit(1663876079.221:7519): avc: denied { entrypoint }
for pid=1750906 comm="rpc-libvirtd" path="/usr/sbin/nbdkit" dev="dm-1"
ino=3160232 scontext=unconfined_u:unconfined_r:svirt_t:s0:c129,c164
tcontext=system_u:object_r:bin_t:s0 tclass=file permissive=0
Then if I temporarily do something like:
# chcon -t qemu_exec_t /usr/sbin/nbdkit
nbdkit (and the domain) starts fine
>> At least in case of the socket we must make sure that no other process
>> can acess it especially once you pass authentication to nbdkit to avoid
>> any kind of backdoor to authenticated storage.
>
> nbdkit does have a --selinux-label argument which will set the label of
> the socket.
>
>>
>> Few more open questions:
>> - What if 'nbdkit' crashes
>> With an integrated block layer, all of the VM crashes. Now when we
>> have a separated access to disks (this is also an issue for use of
>> the qemu-storage-daemon) if any of the helper processes crash we get
>> into a new situation.
>>
>> I think we'll need to think about:
>> - adding an event for any of the helper processes failing
>> - adding a lifecycle action for it (e.g. pause qemu if nbdkit
>> dies)
>> - think about possible recovery of the situation
>
> Indeed. Seems like something re-usable that could also be used for
> qemu-storage-daemon might be useful. I'll look into it.
>
>>
>> - resource pinning
>> For now all the resources are integral to the qemu process so
>> emulator and iothread pinning can be used to steer which cpus the
>> disk should use. With us adding new possibly cpu intensive processes
>> we'll probably need to consider how to handle them more generally and
>> manage their resources.
>
> Good point. I hadn't really thought about this.
>
>>
>> - Integration with qemu storage daemon used for a VM
>>
>> With the attempt to rewrite QSD in other languages it will possibly
>> make sense to run it instead of the native qemu block layer (e.g.
>> take advantage of memory safe languages). So we should also think
>> about how these two will be able to coexist.
>>
>> The last point is more of a future-work thing to consider, but the
>> first two points should be considered for the final release of this
>> feature. Specifically because in your current design it replaces the
>> in-qemu driver even in cases when it is compiled into qemu thus also for
>> existing users. Alternatively it would have to be opt-in.
>
> As far as I can tell, this is currently no good way to determine (via
> capabilities or otherwise) whether e.g. the curl driver is compiled into
> a particular qemu binary. If I've missed something, please let me know.
> My recollection of previous discussions was that I should try to use
> nbdkit if available and then fall back to trying the built-in qemu
> driver approach. But we could change this strategy if there's a good way
> to do it. When you talk about opt-in, do you mean per-domain? e.g. in
> the xml? or some other mechanism?
>
>>
>>> Known shortcomings
>>> - creating disks (in ssh) still isn't supported. I wanted to send
>>> out the
>>> patch series anyway since it's been delayed too long already.
>>
>> That shouldn't be a problem, there's plenty protocols where we don't
>> support creating the storage. Creating storage is needed only for
>> snapshots so we can simply refuse to do it in the first place.
>>
>