I noticed that if a domain fails to restore, the ref count in the xattr
'trusted.libvirt.security.ref_selinux' keeps on increasing indefinitely
and the VM will never restore even if the root cause for the restore
failure has been removed. The reason seems to be that the code to decrease
the ref count never gets called because the block above it fails due
to virSecuritySELinuxTransactionAppend() failing. The simple solution
seems to be to revert the order in which things are done.
Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
---
src/security/security_selinux.c | 19 +++++++++++--------
1 file changed, 11 insertions(+), 8 deletions(-)
diff --git a/src/security/security_selinux.c b/src/security/security_selinux.c
index ea20373a90..9fd29e9bca 100644
--- a/src/security/security_selinux.c
+++ b/src/security/security_selinux.c
@@ -1499,14 +1499,9 @@ virSecuritySELinuxRestoreFileLabel(virSecurityManagerPtr mgr,
goto cleanup;
}
- if ((rc = virSecuritySELinuxTransactionAppend(path, NULL,
- false, recall, true)) < 0) {
- goto cleanup;
- } else if (rc > 0) {
- ret = 0;
- goto cleanup;
- }
-
+ /* Recall the label so the ref count label decreases its counter
+ * even if transaction append below fails.
+ */
if (recall) {
rc = virSecuritySELinuxRecallLabel(newpath, &fcon);
if (rc == -2) {
@@ -1519,6 +1514,14 @@ virSecuritySELinuxRestoreFileLabel(virSecurityManagerPtr mgr,
}
}
+ if ((rc = virSecuritySELinuxTransactionAppend(path, NULL,
+ false, recall, true)) < 0) {
+ goto cleanup;
+ } else if (rc > 0) {
+ ret = 0;
+ goto cleanup;
+ }
+
if (!recall || rc == -2) {
if (stat(newpath, &buf) != 0) {
VIR_WARN("cannot stat %s: %s", newpath,
--
2.20.1
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
On 7/26/19 8:38 PM, Stefan Berger wrote: > I noticed that if a domain fails to restore, the ref count in the xattr > 'trusted.libvirt.security.ref_selinux' keeps on increasing indefinitely > and the VM will never restore even if the root cause for the restore > failure has been removed. The reason seems to be that the code to decrease > the ref count never gets called because the block above it fails due > to virSecuritySELinuxTransactionAppend() failing. The simple solution > seems to be to revert the order in which things are done. > > Signed-off-by: Stefan Berger <stefanb@linux.ibm.com> > --- > src/security/security_selinux.c | 19 +++++++++++-------- > 1 file changed, 11 insertions(+), 8 deletions(-) So what was the root cause, did you ever find out? > > diff --git a/src/security/security_selinux.c b/src/security/security_selinux.c > index ea20373a90..9fd29e9bca 100644 > --- a/src/security/security_selinux.c > +++ b/src/security/security_selinux.c > @@ -1499,14 +1499,9 @@ virSecuritySELinuxRestoreFileLabel(virSecurityManagerPtr mgr, > goto cleanup; > } > > - if ((rc = virSecuritySELinuxTransactionAppend(path, NULL, > - false, recall, true)) < 0) { > - goto cleanup; > - } else if (rc > 0) { > - ret = 0; > - goto cleanup; > - } > - > + /* Recall the label so the ref count label decreases its counter > + * even if transaction append below fails. > + */ > if (recall) { > rc = virSecuritySELinuxRecallLabel(newpath, &fcon); > if (rc == -2) { > @@ -1519,6 +1514,14 @@ virSecuritySELinuxRestoreFileLabel(virSecurityManagerPtr mgr, > } > } > > + if ((rc = virSecuritySELinuxTransactionAppend(path, NULL, > + false, recall, true)) < 0) { > + goto cleanup; > + } else if (rc > 0) { > + ret = 0; > + goto cleanup; > + } > + > if (!recall || rc == -2) { > if (stat(newpath, &buf) != 0) { > VIR_WARN("cannot stat %s: %s", newpath, > IIUC, this gets then called twice, doesn't it? I mean, transactions are hackish a bit. We wanted chown() and setfilecon_raw() to run inside the mount namespace of a domain. So instead of rewriting our secdrivers and making them enter the namespace on every chown()/setfilecon_raw() we put small code snippets just before these calls that would record arguments for these functions in a thread local variable and then transactionCommit() would enter the namespace and replay all chown()/setfilecon_raw(). That's why there can't be any code before transactionAppend(), because it would be called twice - once from regular call and from transactionCommit(). For instance: virSecuritySELinuxDomainSetPathLabel() -> virSecuritySELinuxSetFilecon() -> virSecuritySELinuxSetFileconHelper() -> virSecuritySELinuxTransactionAppend() virSecuritySELinuxTransactionCommit() -> virSecuritySELinuxTransactionRun() -> virSecuritySELinuxSetFileconHelper() -> virSecuritySELinuxSetFileconImpl() -> setfilecon_raw() We need to find the root cause. Maybe our set is not on par with restore, i.e. we are calling more set than restore and thus the refcounter just keeps growing? Michal -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
On 7/29/19 5:17 AM, Michal Privoznik wrote: > On 7/26/19 8:38 PM, Stefan Berger wrote: >> I noticed that if a domain fails to restore, the ref count in the xattr >> 'trusted.libvirt.security.ref_selinux' keeps on increasing indefinitely >> and the VM will never restore even if the root cause for the restore >> failure has been removed. The reason seems to be that the code to >> decrease >> the ref count never gets called because the block above it fails due >> to virSecuritySELinuxTransactionAppend() failing. The simple solution >> seems to be to revert the order in which things are done. >> >> Signed-off-by: Stefan Berger <stefanb@linux.ibm.com> >> --- >> src/security/security_selinux.c | 19 +++++++++++-------- >> 1 file changed, 11 insertions(+), 8 deletions(-) > > So what was the root cause, did you ever find out? The root cause of the failure was basically me (intentionally) changing the password for the encrypted vTPM to a wrong password and then QEMU cannot *resume* 'swtpm' since it cannot decrypt the state. The operation on virt-manager level was a restore of a saved VM. > >> >> diff --git a/src/security/security_selinux.c >> b/src/security/security_selinux.c >> index ea20373a90..9fd29e9bca 100644 >> --- a/src/security/security_selinux.c >> +++ b/src/security/security_selinux.c >> @@ -1499,14 +1499,9 @@ >> virSecuritySELinuxRestoreFileLabel(virSecurityManagerPtr mgr, >> goto cleanup; >> } >> - if ((rc = virSecuritySELinuxTransactionAppend(path, NULL, >> - false, recall, >> true)) < 0) { >> - goto cleanup; >> - } else if (rc > 0) { >> - ret = 0; >> - goto cleanup; >> - } >> - >> + /* Recall the label so the ref count label decreases its counter >> + * even if transaction append below fails. >> + */ >> if (recall) { >> rc = virSecuritySELinuxRecallLabel(newpath, &fcon); >> if (rc == -2) { >> @@ -1519,6 +1514,14 @@ >> virSecuritySELinuxRestoreFileLabel(virSecurityManagerPtr mgr, >> } >> } >> + if ((rc = virSecuritySELinuxTransactionAppend(path, NULL, >> + false, recall, >> true)) < 0) { >> + goto cleanup; >> + } else if (rc > 0) { >> + ret = 0; >> + goto cleanup; >> + } >> + >> if (!recall || rc == -2) { >> if (stat(newpath, &buf) != 0) { >> VIR_WARN("cannot stat %s: %s", newpath, >> > > IIUC, this gets then called twice, doesn't it? I mean, transactions > are hackish a bit. We wanted chown() and setfilecon_raw() to run > inside the mount namespace of a domain. So instead of rewriting our > secdrivers and making them enter the namespace on every > chown()/setfilecon_raw() we put small code snippets just before these > calls that would record arguments for these functions in a thread > local variable and then transactionCommit() would enter the namespace > and replay all chown()/setfilecon_raw(). > > That's why there can't be any code before transactionAppend(), because > it would be called twice - once from regular call and from > transactionCommit(). For instance: > > virSecuritySELinuxDomainSetPathLabel() > -> virSecuritySELinuxSetFilecon() > -> virSecuritySELinuxSetFileconHelper() > -> virSecuritySELinuxTransactionAppend() > > virSecuritySELinuxTransactionCommit() > -> virSecuritySELinuxTransactionRun() > -> virSecuritySELinuxSetFileconHelper() > -> virSecuritySELinuxSetFileconImpl() > -> setfilecon_raw() > > We need to find the root cause. Maybe our set is not on par with > restore, i.e. we are calling more set than restore and thus the > refcounter just keeps growing? The root cause was as described above and I wanted to see how QEMU+SWTPM react if I do this. What I noticed was that the ...TransactionAppend() failed and didn't get to the code to decrease the ref count, which then ended up increasing forever. So I created this patch, which did indeed solve this particular problem but then caused Signal 11's once I provoked an error with starting (rather than resuming) a VM where again I had changed the password (iirc). Stefan > > Michal > -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
On 8/1/19 7:45 PM, Stefan Berger wrote: > On 7/29/19 5:17 AM, Michal Privoznik wrote: >> On 7/26/19 8:38 PM, Stefan Berger wrote: >>> I noticed that if a domain fails to restore, the ref count in the xattr >>> 'trusted.libvirt.security.ref_selinux' keeps on increasing indefinitely >>> and the VM will never restore even if the root cause for the restore >>> failure has been removed. The reason seems to be that the code to >>> decrease >>> the ref count never gets called because the block above it fails due >>> to virSecuritySELinuxTransactionAppend() failing. The simple solution >>> seems to be to revert the order in which things are done. >>> >>> Signed-off-by: Stefan Berger <stefanb@linux.ibm.com> >>> --- >>> src/security/security_selinux.c | 19 +++++++++++-------- >>> 1 file changed, 11 insertions(+), 8 deletions(-) >> >> So what was the root cause, did you ever find out? > > > The root cause of the failure was basically me (intentionally) changing > the password for the encrypted vTPM to a wrong password and then QEMU > cannot *resume* 'swtpm' since it cannot decrypt the state. The operation > on virt-manager level was a restore of a saved VM. > > <snip/> > > The root cause was as described above and I wanted to see how QEMU+SWTPM > react if I do this. > > > What I noticed was that the ...TransactionAppend() failed and didn't get > to the code to decrease the ref count, which then ended up increasing > forever. So I created this patch, which did indeed solve this particular > problem but then caused Signal 11's once I provoked an error with > starting (rather than resuming) a VM where again I had changed the > password (iirc). > Ah, so TransactionAppend() failed? Huh, looking into the code that can happen only in case of OOM at which case all the bets are off IMO. But then again, getting malloc() to return NULL is not easy and certainly won't happen in the default setup. So I'm wondering what else could make TransactionAppend() fail. Michal -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
© 2016 - 2024 Red Hat, Inc.