[PATCH] tools/xenstore: claim resources when running as daemon

Juergen Gross posted 1 patch 3 years, 6 months ago
Test gitlab-ci failed
Patches applied successfully (tree, apply log)
git fetch https://gitlab.com/xen-project/patchew/xen tags/patchew/20210514084133.18658-1-jgross@suse.com
tools/xenstore/xenstored_core.c   |  2 ++
tools/xenstore/xenstored_core.h   |  3 ++
tools/xenstore/xenstored_minios.c |  4 +++
tools/xenstore/xenstored_posix.c  | 46 +++++++++++++++++++++++++++++++
4 files changed, 55 insertions(+)
[PATCH] tools/xenstore: claim resources when running as daemon
Posted by Juergen Gross 3 years, 6 months ago
Xenstored is absolutely mandatory for a Xen host and it can't be
restarted, so being killed by OOM-killer in case of memory shortage is
to be avoided.

Set /proc/$pid/oom_score_adj (if available) to -500 in order to allow
xenstored to use large amounts of memory without being killed.

In order to support large numbers of domains the limit for open file
descriptors might need to be raised. Each domain needs 2 file
descriptors (one for the event channel and one for the xl per-domain
daemon to connect to xenstored).

Try to raise ulimit for open files to 65536. First the hard limit if
needed, and then the soft limit.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 tools/xenstore/xenstored_core.c   |  2 ++
 tools/xenstore/xenstored_core.h   |  3 ++
 tools/xenstore/xenstored_minios.c |  4 +++
 tools/xenstore/xenstored_posix.c  | 46 +++++++++++++++++++++++++++++++
 4 files changed, 55 insertions(+)

diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
index b66d119a98..964e693450 100644
--- a/tools/xenstore/xenstored_core.c
+++ b/tools/xenstore/xenstored_core.c
@@ -2243,6 +2243,8 @@ int main(int argc, char *argv[])
 		xprintf = trace;
 #endif
 
+	claim_resources();
+
 	signal(SIGHUP, trigger_reopen_log);
 	if (tracefile)
 		tracefile = talloc_strdup(NULL, tracefile);
diff --git a/tools/xenstore/xenstored_core.h b/tools/xenstore/xenstored_core.h
index 1467270476..ac26973648 100644
--- a/tools/xenstore/xenstored_core.h
+++ b/tools/xenstore/xenstored_core.h
@@ -255,6 +255,9 @@ void daemonize(void);
 /* Close stdin/stdout/stderr to complete daemonize */
 void finish_daemonize(void);
 
+/* Set OOM-killer score and raise ulimit. */
+void claim_resources(void);
+
 /* Open a pipe for signal handling */
 void init_pipe(int reopen_log_pipe[2]);
 
diff --git a/tools/xenstore/xenstored_minios.c b/tools/xenstore/xenstored_minios.c
index c94493e52a..df8ff580b0 100644
--- a/tools/xenstore/xenstored_minios.c
+++ b/tools/xenstore/xenstored_minios.c
@@ -32,6 +32,10 @@ void finish_daemonize(void)
 {
 }
 
+void claim_resources(void)
+{
+}
+
 void init_pipe(int reopen_log_pipe[2])
 {
 	reopen_log_pipe[0] = -1;
diff --git a/tools/xenstore/xenstored_posix.c b/tools/xenstore/xenstored_posix.c
index 48c37ffe3e..0074fbd8b2 100644
--- a/tools/xenstore/xenstored_posix.c
+++ b/tools/xenstore/xenstored_posix.c
@@ -22,6 +22,7 @@
 #include <fcntl.h>
 #include <stdlib.h>
 #include <sys/mman.h>
+#include <sys/resource.h>
 
 #include "utils.h"
 #include "xenstored_core.h"
@@ -87,6 +88,51 @@ void finish_daemonize(void)
 	close(devnull);
 }
 
+static void avoid_oom_killer(void)
+{
+	char path[32];
+	char val[] = "-500";
+	int fd;
+
+	snprintf(path, sizeof(path), "/proc/%d/oom_score_adj", (int)getpid());
+
+	fd = open(path, O_WRONLY);
+	/* Do nothing if file doesn't exist. */
+	if (fd < 0)
+		return;
+	/* Ignore errors. */
+	write(fd, val, sizeof(val));
+	close(fd);
+}
+
+/* Max. 32752 domains with 2 open files per domain, plus some spare. */
+#define MAX_FILES 65536
+static void raise_ulimit(void)
+{
+	struct rlimit rlim;
+
+	if (getrlimit(RLIMIT_NOFILE, &rlim))
+		return;
+	if (rlim.rlim_max != RLIM_INFINITY && rlim.rlim_max < MAX_FILES)
+	{
+		rlim.rlim_max = MAX_FILES;
+		setrlimit(RLIMIT_NOFILE, &rlim);
+	}
+	if (getrlimit(RLIMIT_NOFILE, &rlim))
+		return;
+	if (rlim.rlim_max == RLIM_INFINITY || rlim.rlim_max > MAX_FILES)
+		rlim.rlim_cur = MAX_FILES;
+	else
+		rlim.rlim_cur = rlim.rlim_max;
+	setrlimit(RLIMIT_NOFILE, &rlim);
+}
+
+void claim_resources(void)
+{
+	avoid_oom_killer();
+	raise_ulimit();
+}
+
 void init_pipe(int reopen_log_pipe[2])
 {
 	int flags;
-- 
2.26.2


Re: [PATCH] tools/xenstore: claim resources when running as daemon
Posted by Julien Grall 3 years, 6 months ago
Hi Juergen,

On 14/05/2021 09:41, Juergen Gross wrote:
> Xenstored is absolutely mandatory for a Xen host and it can't be
> restarted, so being killed by OOM-killer in case of memory shortage is
> to be avoided.
> 
> Set /proc/$pid/oom_score_adj (if available) to -500 in order to allow
> xenstored to use large amounts of memory without being killed.
> 
> In order to support large numbers of domains the limit for open file
> descriptors might need to be raised. Each domain needs 2 file
> descriptors (one for the event channel and one for the xl per-domain
> daemon to connect to xenstored).

Hmmm... AFAICT there is only one file descriptor to handle all the event 
channels. Could you point out the code showing one event FD per domain?

> 
> Try to raise ulimit for open files to 65536. First the hard limit if
> needed, and then the soft limit.

I am not sure this is right to impose this limit to everyone. For 
instance, one admin may know that there will be no more than 100 domains 
on its system.

So the admin should be able to configure them. At this point, I think 
the two limit should be set my the initscript rather than xenstored itself.

This would also avoid the problem where Xenstored is not allowed to 
modify its limit (see more below).

> 
> Signed-off-by: Juergen Gross <jgross@suse.com>
> ---
>   tools/xenstore/xenstored_core.c   |  2 ++
>   tools/xenstore/xenstored_core.h   |  3 ++
>   tools/xenstore/xenstored_minios.c |  4 +++
>   tools/xenstore/xenstored_posix.c  | 46 +++++++++++++++++++++++++++++++
>   4 files changed, 55 insertions(+)
> 
> diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
> index b66d119a98..964e693450 100644
> --- a/tools/xenstore/xenstored_core.c
> +++ b/tools/xenstore/xenstored_core.c
> @@ -2243,6 +2243,8 @@ int main(int argc, char *argv[])
>   		xprintf = trace;
>   #endif
>   
> +	claim_resources();
> +
>   	signal(SIGHUP, trigger_reopen_log);
>   	if (tracefile)
>   		tracefile = talloc_strdup(NULL, tracefile);
> diff --git a/tools/xenstore/xenstored_core.h b/tools/xenstore/xenstored_core.h
> index 1467270476..ac26973648 100644
> --- a/tools/xenstore/xenstored_core.h
> +++ b/tools/xenstore/xenstored_core.h
> @@ -255,6 +255,9 @@ void daemonize(void);
>   /* Close stdin/stdout/stderr to complete daemonize */
>   void finish_daemonize(void);
>   
> +/* Set OOM-killer score and raise ulimit. */
> +void claim_resources(void);
> +
>   /* Open a pipe for signal handling */
>   void init_pipe(int reopen_log_pipe[2]);
>   
> diff --git a/tools/xenstore/xenstored_minios.c b/tools/xenstore/xenstored_minios.c
> index c94493e52a..df8ff580b0 100644
> --- a/tools/xenstore/xenstored_minios.c
> +++ b/tools/xenstore/xenstored_minios.c
> @@ -32,6 +32,10 @@ void finish_daemonize(void)
>   {
>   }
>   
> +void claim_resources(void)
> +{
> +}
> +
>   void init_pipe(int reopen_log_pipe[2])
>   {
>   	reopen_log_pipe[0] = -1;
> diff --git a/tools/xenstore/xenstored_posix.c b/tools/xenstore/xenstored_posix.c
> index 48c37ffe3e..0074fbd8b2 100644
> --- a/tools/xenstore/xenstored_posix.c
> +++ b/tools/xenstore/xenstored_posix.c
> @@ -22,6 +22,7 @@
>   #include <fcntl.h>
>   #include <stdlib.h>
>   #include <sys/mman.h>
> +#include <sys/resource.h>
>   
>   #include "utils.h"
>   #include "xenstored_core.h"
> @@ -87,6 +88,51 @@ void finish_daemonize(void)
>   	close(devnull);
>   }
>   
> +static void avoid_oom_killer(void)
> +{
> +	char path[32];
> +	char val[] = "-500";
> +	int fd;
> +
> +	snprintf(path, sizeof(path), "/proc/%d/oom_score_adj", (int)getpid());

This looks Linux specific. How about other OSes?

> +
> +	fd = open(path, O_WRONLY);
> +	/* Do nothing if file doesn't exist. */

Your commit message leads to think that we *must* configure the OOM. If 
not, then we should not continue. But here, this suggest this is 
optional. In fact...

> +	if (fd < 0)
> +		return;
> +	/* Ignore errors. */
> +	write(fd, val, sizeof(val));

... xenstored may not be allowed to modify its own parameters. So this 
would continue silently without the admin necessarily knowning the limit 
wasn't applied.

> +	close(fd);
> +}
> +
> +/* Max. 32752 domains with 2 open files per domain, plus some spare. */
> +#define MAX_FILES 65536
> +static void raise_ulimit(void)
> +{
> +	struct rlimit rlim;
> +
> +	if (getrlimit(RLIMIT_NOFILE, &rlim))
> +		return;
> +	if (rlim.rlim_max != RLIM_INFINITY && rlim.rlim_max < MAX_FILES)
> +	{
> +		rlim.rlim_max = MAX_FILES;
> +		setrlimit(RLIMIT_NOFILE, &rlim);
> +	}
> +	if (getrlimit(RLIMIT_NOFILE, &rlim))
> +		return;
> +	if (rlim.rlim_max == RLIM_INFINITY || rlim.rlim_max > MAX_FILES)
> +		rlim.rlim_cur = MAX_FILES;
> +	else
> +		rlim.rlim_cur = rlim.rlim_max;
> +	setrlimit(RLIMIT_NOFILE, &rlim);
> +}
> +
> +void claim_resources(void)
> +{
> +	avoid_oom_killer();
> +	raise_ulimit();
> +}
> +
>   void init_pipe(int reopen_log_pipe[2])
>   {
>   	int flags;
> 

Cheers,

-- 
Julien Grall

Re: [PATCH] tools/xenstore: claim resources when running as daemon
Posted by Juergen Gross 3 years, 6 months ago
On 14.05.21 22:19, Julien Grall wrote:
> Hi Juergen,
> 
> On 14/05/2021 09:41, Juergen Gross wrote:
>> Xenstored is absolutely mandatory for a Xen host and it can't be
>> restarted, so being killed by OOM-killer in case of memory shortage is
>> to be avoided.
>>
>> Set /proc/$pid/oom_score_adj (if available) to -500 in order to allow
>> xenstored to use large amounts of memory without being killed.
>>
>> In order to support large numbers of domains the limit for open file
>> descriptors might need to be raised. Each domain needs 2 file
>> descriptors (one for the event channel and one for the xl per-domain
>> daemon to connect to xenstored).
> 
> Hmmm... AFAICT there is only one file descriptor to handle all the event 
> channels. Could you point out the code showing one event FD per domain?

I have let me fooled by just counting the file descriptors used with one
or two domain active.

So you are right that all event channels only use one fd, but each xl
daemon will use two (which should be fixed, IMO). And thinking more
about it it is even worse: each qemu process will at least require one
additional fd.

> 
>>
>> Try to raise ulimit for open files to 65536. First the hard limit if
>> needed, and then the soft limit.
> 
> I am not sure this is right to impose this limit to everyone. For 
> instance, one admin may know that there will be no more than 100 domains 
> on its system.

Is setting a higher limit really a problem?

> So the admin should be able to configure them. At this point, I think 
> the two limit should be set my the initscript rather than xenstored itself.

But the admin would need to know the Xen internals for selecting the
correct limits. In the end I'd be fine with moving this modification to
the script starting Xenstore (which would be launch-xenstore), but the
configuration item should be "max number of domains to support".

> 
> This would also avoid the problem where Xenstored is not allowed to 
> modify its limit (see more below).
> 
>>
>> Signed-off-by: Juergen Gross <jgross@suse.com>
>> ---
>>   tools/xenstore/xenstored_core.c   |  2 ++
>>   tools/xenstore/xenstored_core.h   |  3 ++
>>   tools/xenstore/xenstored_minios.c |  4 +++
>>   tools/xenstore/xenstored_posix.c  | 46 +++++++++++++++++++++++++++++++
>>   4 files changed, 55 insertions(+)
>>
>> diff --git a/tools/xenstore/xenstored_core.c 
>> b/tools/xenstore/xenstored_core.c
>> index b66d119a98..964e693450 100644
>> --- a/tools/xenstore/xenstored_core.c
>> +++ b/tools/xenstore/xenstored_core.c
>> @@ -2243,6 +2243,8 @@ int main(int argc, char *argv[])
>>           xprintf = trace;
>>   #endif
>> +    claim_resources();
>> +
>>       signal(SIGHUP, trigger_reopen_log);
>>       if (tracefile)
>>           tracefile = talloc_strdup(NULL, tracefile);
>> diff --git a/tools/xenstore/xenstored_core.h 
>> b/tools/xenstore/xenstored_core.h
>> index 1467270476..ac26973648 100644
>> --- a/tools/xenstore/xenstored_core.h
>> +++ b/tools/xenstore/xenstored_core.h
>> @@ -255,6 +255,9 @@ void daemonize(void);
>>   /* Close stdin/stdout/stderr to complete daemonize */
>>   void finish_daemonize(void);
>> +/* Set OOM-killer score and raise ulimit. */
>> +void claim_resources(void);
>> +
>>   /* Open a pipe for signal handling */
>>   void init_pipe(int reopen_log_pipe[2]);
>> diff --git a/tools/xenstore/xenstored_minios.c 
>> b/tools/xenstore/xenstored_minios.c
>> index c94493e52a..df8ff580b0 100644
>> --- a/tools/xenstore/xenstored_minios.c
>> +++ b/tools/xenstore/xenstored_minios.c
>> @@ -32,6 +32,10 @@ void finish_daemonize(void)
>>   {
>>   }
>> +void claim_resources(void)
>> +{
>> +}
>> +
>>   void init_pipe(int reopen_log_pipe[2])
>>   {
>>       reopen_log_pipe[0] = -1;
>> diff --git a/tools/xenstore/xenstored_posix.c 
>> b/tools/xenstore/xenstored_posix.c
>> index 48c37ffe3e..0074fbd8b2 100644
>> --- a/tools/xenstore/xenstored_posix.c
>> +++ b/tools/xenstore/xenstored_posix.c
>> @@ -22,6 +22,7 @@
>>   #include <fcntl.h>
>>   #include <stdlib.h>
>>   #include <sys/mman.h>
>> +#include <sys/resource.h>
>>   #include "utils.h"
>>   #include "xenstored_core.h"
>> @@ -87,6 +88,51 @@ void finish_daemonize(void)
>>       close(devnull);
>>   }
>> +static void avoid_oom_killer(void)
>> +{
>> +    char path[32];
>> +    char val[] = "-500";
>> +    int fd;
>> +
>> +    snprintf(path, sizeof(path), "/proc/%d/oom_score_adj", 
>> (int)getpid());
> 
> This looks Linux specific. How about other OSes?

I don't know whether other OSes have an OOM killer, and if they do, how
to configure it. It is a best effort attempt, after all.

> 
>> +
>> +    fd = open(path, O_WRONLY);
>> +    /* Do nothing if file doesn't exist. */
> 
> Your commit message leads to think that we *must* configure the OOM. If 
> not, then we should not continue. But here, this suggest this is 
> optional. In fact...

I can modify the commit message by adding a "Try to".

> 
>> +    if (fd < 0)
>> +        return;
>> +    /* Ignore errors. */
>> +    write(fd, val, sizeof(val));
> 
> ... xenstored may not be allowed to modify its own parameters. So this 
> would continue silently without the admin necessarily knowning the limit 
> wasn't applied.

I can add a line in the Xenstore log in this regard.


Juergen
Re: [PATCH] tools/xenstore: claim resources when running as daemon
Posted by Julien Grall 3 years, 6 months ago
Hi Juergen,

On 17/05/2021 07:47, Juergen Gross wrote:
> On 14.05.21 22:19, Julien Grall wrote:
>> Hi Juergen,
>>
>> On 14/05/2021 09:41, Juergen Gross wrote:
>>> Xenstored is absolutely mandatory for a Xen host and it can't be
>>> restarted, so being killed by OOM-killer in case of memory shortage is
>>> to be avoided.
>>>
>>> Set /proc/$pid/oom_score_adj (if available) to -500 in order to allow
>>> xenstored to use large amounts of memory without being killed.
>>>
>>> In order to support large numbers of domains the limit for open file
>>> descriptors might need to be raised. Each domain needs 2 file
>>> descriptors (one for the event channel and one for the xl per-domain
>>> daemon to connect to xenstored).
>>
>> Hmmm... AFAICT there is only one file descriptor to handle all the 
>> event channels. Could you point out the code showing one event FD per 
>> domain?
> 
> I have let me fooled by just counting the file descriptors used with one
> or two domain active.
> 
> So you are right that all event channels only use one fd, but each xl
> daemon will use two (which should be fixed, IMO). And thinking more
> about it it is even worse: each qemu process will at least require one
> additional fd.
> 
>>
>>>
>>> Try to raise ulimit for open files to 65536. First the hard limit if
>>> needed, and then the soft limit.
>>
>> I am not sure this is right to impose this limit to everyone. For 
>> instance, one admin may know that there will be no more than 100 
>> domains on its system.
> 
> Is setting a higher limit really a problem?

I am quite unease to set a limit that nearly nobody will reach unless 
something went horribly wrong on the system.

> 
>> So the admin should be able to configure them. At this point, I think 
>> the two limit should be set my the initscript rather than xenstored 
>> itself.
> 
> But the admin would need to know the Xen internals for selecting the
> correct limits. In the end I'd be fine with moving this modification to
> the script starting Xenstore (which would be launch-xenstore), but the
> configuration item should be "max number of domains to support".

I would be fine with "max numer of domains to support". What I care the 
most here is the limits are actually applied most of (if not all) the time.

> 
>>
>> This would also avoid the problem where Xenstored is not allowed to 
>> modify its limit (see more below).
>>
>>>
>>> Signed-off-by: Juergen Gross <jgross@suse.com>
>>> ---
>>>   tools/xenstore/xenstored_core.c   |  2 ++
>>>   tools/xenstore/xenstored_core.h   |  3 ++
>>>   tools/xenstore/xenstored_minios.c |  4 +++
>>>   tools/xenstore/xenstored_posix.c  | 46 +++++++++++++++++++++++++++++++
>>>   4 files changed, 55 insertions(+)
>>>
>>> diff --git a/tools/xenstore/xenstored_core.c 
>>> b/tools/xenstore/xenstored_core.c
>>> index b66d119a98..964e693450 100644
>>> --- a/tools/xenstore/xenstored_core.c
>>> +++ b/tools/xenstore/xenstored_core.c
>>> @@ -2243,6 +2243,8 @@ int main(int argc, char *argv[])
>>>           xprintf = trace;
>>>   #endif
>>> +    claim_resources();
>>> +
>>>       signal(SIGHUP, trigger_reopen_log);
>>>       if (tracefile)
>>>           tracefile = talloc_strdup(NULL, tracefile);
>>> diff --git a/tools/xenstore/xenstored_core.h 
>>> b/tools/xenstore/xenstored_core.h
>>> index 1467270476..ac26973648 100644
>>> --- a/tools/xenstore/xenstored_core.h
>>> +++ b/tools/xenstore/xenstored_core.h
>>> @@ -255,6 +255,9 @@ void daemonize(void);
>>>   /* Close stdin/stdout/stderr to complete daemonize */
>>>   void finish_daemonize(void);
>>> +/* Set OOM-killer score and raise ulimit. */
>>> +void claim_resources(void);
>>> +
>>>   /* Open a pipe for signal handling */
>>>   void init_pipe(int reopen_log_pipe[2]);
>>> diff --git a/tools/xenstore/xenstored_minios.c 
>>> b/tools/xenstore/xenstored_minios.c
>>> index c94493e52a..df8ff580b0 100644
>>> --- a/tools/xenstore/xenstored_minios.c
>>> +++ b/tools/xenstore/xenstored_minios.c
>>> @@ -32,6 +32,10 @@ void finish_daemonize(void)
>>>   {
>>>   }
>>> +void claim_resources(void)
>>> +{
>>> +}
>>> +
>>>   void init_pipe(int reopen_log_pipe[2])
>>>   {
>>>       reopen_log_pipe[0] = -1;
>>> diff --git a/tools/xenstore/xenstored_posix.c 
>>> b/tools/xenstore/xenstored_posix.c
>>> index 48c37ffe3e..0074fbd8b2 100644
>>> --- a/tools/xenstore/xenstored_posix.c
>>> +++ b/tools/xenstore/xenstored_posix.c
>>> @@ -22,6 +22,7 @@
>>>   #include <fcntl.h>
>>>   #include <stdlib.h>
>>>   #include <sys/mman.h>
>>> +#include <sys/resource.h>
>>>   #include "utils.h"
>>>   #include "xenstored_core.h"
>>> @@ -87,6 +88,51 @@ void finish_daemonize(void)
>>>       close(devnull);
>>>   }
>>> +static void avoid_oom_killer(void)
>>> +{
>>> +    char path[32];
>>> +    char val[] = "-500";
>>> +    int fd;
>>> +
>>> +    snprintf(path, sizeof(path), "/proc/%d/oom_score_adj", 
>>> (int)getpid());
>>
>> This looks Linux specific. How about other OSes?
> 
> I don't know whether other OSes have an OOM killer, and if they do, how
> to configure it. It is a best effort attempt, after all.

I have CCed Roger who should be able to help for FreeBSD.

> 
>>
>>> +
>>> +    fd = open(path, O_WRONLY);
>>> +    /* Do nothing if file doesn't exist. */
>>
>> Your commit message leads to think that we *must* configure the OOM. 
>> If not, then we should not continue. But here, this suggest this is 
>> optional. In fact...
> 
> I can modify the commit message by adding a "Try to".
> 
>>
>>> +    if (fd < 0)
>>> +        return;
>>> +    /* Ignore errors. */
>>> +    write(fd, val, sizeof(val));
>>
>> ... xenstored may not be allowed to modify its own parameters. So this 
>> would continue silently without the admin necessarily knowning the 
>> limit wasn't applied.
> 
> I can add a line in the Xenstore log in this regard.

This feels wrong to me. If a limit cannot be applied then it should fail 
early rather than possibly at the wrong moment a few days (months?) after.

Cheers,

-- 
Julien Grall

Re: [PATCH] tools/xenstore: claim resources when running as daemon
Posted by Juergen Gross 3 years, 6 months ago
On 17.05.21 17:55, Julien Grall wrote:
> Hi Juergen,
> 
> On 17/05/2021 07:47, Juergen Gross wrote:
>> On 14.05.21 22:19, Julien Grall wrote:
>>> Hi Juergen,
>>>
>>> On 14/05/2021 09:41, Juergen Gross wrote:
>>>> Xenstored is absolutely mandatory for a Xen host and it can't be
>>>> restarted, so being killed by OOM-killer in case of memory shortage is
>>>> to be avoided.
>>>>
>>>> Set /proc/$pid/oom_score_adj (if available) to -500 in order to allow
>>>> xenstored to use large amounts of memory without being killed.
>>>>
>>>> In order to support large numbers of domains the limit for open file
>>>> descriptors might need to be raised. Each domain needs 2 file
>>>> descriptors (one for the event channel and one for the xl per-domain
>>>> daemon to connect to xenstored).
>>>
>>> Hmmm... AFAICT there is only one file descriptor to handle all the 
>>> event channels. Could you point out the code showing one event FD per 

>>> domain?
>>
>> I have let me fooled by just counting the file descriptors used with one
>> or two domain active.
>>
>> So you are right that all event channels only use one fd, but each xl
>> daemon will use two (which should be fixed, IMO). And thinking more
>> about it it is even worse: each qemu process will at least require one
>> additional fd.
>>
>>>
>>>>
>>>> Try to raise ulimit for open files to 65536. First the hard limit if
>>>> needed, and then the soft limit.
>>>
>>> I am not sure this is right to impose this limit to everyone. For 
>>> instance, one admin may know that there will be no more than 100 
>>> domains on its system.
>>
>> Is setting a higher limit really a problem?
> 
> I am quite unease to set a limit that nearly nobody will reach unless 
> something went horribly wrong on the system.

Hmm, I really don't see the downside of having the possibility to let
xenstored open lots of files.

Anyway we can do it via prlimit if you like that better.

> 
>>
>>> So the admin should be able to configure them. At this point, I think 

>>> the two limit should be set my the initscript rather than xenstored 
>>> itself.
>>
>> But the admin would need to know the Xen internals for selecting the
>> correct limits. In the end I'd be fine with moving this modification to
>> the script starting Xenstore (which would be launch-xenstore), but the
>> configuration item should be "max number of domains to support".
> 
> I would be fine with "max numer of domains to support". What I care the 

> most here is the limits are actually applied most of (if not all) the time.

I did another test and found:

- the xl daemon for a guest will use 2 socket connections
- qemu for a HVM guest will use 3 socket connections
- qemu for a PV guest is using one socket connection
- 14 other files are used by xenstored

So we should set the limit to 5 * n_dom + 100 (some headroom will be
nice IMO).

> 
>>
>>>
>>> This would also avoid the problem where Xenstored is not allowed to 
>>> modify its limit (see more below).
>>>
>>>>
>>>> Signed-off-by: Juergen Gross <jgross@suse.com>
>>>> ---
>>>>   tools/xenstore/xenstored_core.c   |  2 ++
>>>>   tools/xenstore/xenstored_core.h   |  3 ++
>>>>   tools/xenstore/xenstored_minios.c |  4 +++
>>>>   tools/xenstore/xenstored_posix.c  | 46 
>>>> +++++++++++++++++++++++++++++++
>>>>   4 files changed, 55 insertions(+)
>>>>
>>>> diff --git a/tools/xenstore/xenstored_core.c 
>>>> b/tools/xenstore/xenstored_core.c
>>>> index b66d119a98..964e693450 100644
>>>> --- a/tools/xenstore/xenstored_core.c
>>>> +++ b/tools/xenstore/xenstored_core.c
>>>> @@ -2243,6 +2243,8 @@ int main(int argc, char *argv[])
>>>>           xprintf = trace;
>>>>   #endif
>>>> +    claim_resources();
>>>> +
>>>>       signal(SIGHUP, trigger_reopen_log);
>>>>       if (tracefile)
>>>>           tracefile = 
talloc_strdup(NULL, tracefile);
>>>> diff --git a/tools/xenstore/xenstored_core.h 
>>>> b/tools/xenstore/xenstored_core.h
>>>> index 1467270476..ac26973648 100644
>>>> --- a/tools/xenstore/xenstored_core.h
>>>> +++ b/tools/xenstore/xenstored_core.h
>>>> @@ -255,6 +255,9 @@ void daemonize(void);
>>>>   /* Close stdin/stdout/stderr to complete daemonize */
>>>>   void finish_daemonize(void);
>>>> +/* Set OOM-killer score and raise ulimit. */
>>>> +void claim_resources(void);
>>>> +
>>>>   /* Open a pipe for signal handling */
>>>>   void init_pipe(int reopen_log_pipe[2]);
>>>> diff --git a/tools/xenstore/xenstored_minios.c 
>>>> b/tools/xenstore/xenstored_minios.c
>>>> index c94493e52a..df8ff580b0 100644
>>>> --- a/tools/xenstore/xenstored_minios.c
>>>> +++ b/tools/xenstore/xenstored_minios.c
>>>> @@ -32,6 +32,10 @@ void finish_daemonize(void)
>>>>   {
>>>>   }
>>>> +void claim_resources(void)
>>>> +{
>>>> +}
>>>> +
>>>>   void init_pipe(int reopen_log_pipe[2])
>>>>   {
>>>>       reopen_log_pipe[0] = -1;
>>>> diff --git a/tools/xenstore/xenstored_posix.c 
>>>> b/tools/xenstore/xenstored_posix.c
>>>> index 48c37ffe3e..0074fbd8b2 100644
>>>> --- a/tools/xenstore/xenstored_posix.c
>>>> +++ b/tools/xenstore/xenstored_posix.c
>>>> @@ -22,6 +22,7 @@
>>>>   #include <fcntl.h>
>>>>   #include <stdlib.h>
>>>>   #include <sys/mman.h>
>>>> +#include <sys/resource.h>
>>>>   #include "utils.h"
>>>>   #include "xenstored_core.h"
>>>> @@ -87,6 +88,51 @@ void finish_daemonize(void)
>>>>       close(devnull);
>>>>   }
>>>> +static void avoid_oom_killer(void)
>>>> +{
>>>> +    char path[32];
>>>> +    char val[] = "-500";
>>>> +    int fd;
>>>> +
>>>> +    snprintf(path, sizeof(path), "/proc/%d/oom_score_adj", 
>>>> (int)getpid());
>>>
>>> This looks Linux specific. How about other OSes?
>>
>> I don't know whether other OSes have an OOM killer, and if they do, how
>> to configure it. It is a best effort attempt, after all.
> 
> I have CCed Roger who should be able to help for FreeBSD.

It would be possible to set the OOM-score from the launch script, too.

> 
>>
>>>
>>>> +
>>>> +    fd = open(path, O_WRONLY);
>>>> +    /* Do nothing if file doesn't exist. */
>>>
>>> Your commit message leads to think that we *must* configure the OOM. 
>>> If not, then we should not continue. But here, this suggest this is 
>>> optional. In fact...
>>
>> I can modify the commit message by adding a "Try to".
>>
>>>
>>>> +    if (fd < 0)
>>>> +        return;
>>>> +    /* Ignore errors. */
>>>> +    write(fd, val, sizeof(val));
>>>
>>> ... xenstored may not be allowed to modify its own parameters. So 
>>> this would continue silently without the admin necessarily knowning 
>>> the limit wasn't applied.
>>
>> I can add a line in the Xenstore log in this regard.
> 
> This feels wrong to me. If a limit cannot be applied then it should fail 
> early rather than possibly at the wrong moment a few days (months?) after.

I think issuing a warning would be better here. We are running with
no OOM adjustments since years now.


Juergen
Re: [PATCH] tools/xenstore: claim resources when running as daemon
Posted by Julien Grall 3 years, 6 months ago
Hi Juergen,

On 18/05/2021 07:43, Juergen Gross wrote:
> On 17.05.21 17:55, Julien Grall wrote:
>>
>>>
>>>> So the admin should be able to configure them. At this point, I think 
> 
>>>> the two limit should be set my the initscript rather than xenstored 
>>>> itself.
>>>
>>> But the admin would need to know the Xen internals for selecting the
>>> correct limits. In the end I'd be fine with moving this modification to
>>> the script starting Xenstore (which would be launch-xenstore), but the
>>> configuration item should be "max number of domains to support".
>>
>> I would be fine with "max numer of domains to support". What I care the 
> 
>> most here is the limits are actually applied most of (if not all) the 
>> time.
> 
> I did another test and found:
> 
> - the xl daemon for a guest will use 2 socket connections
> - qemu for a HVM guest will use 3 socket connections
> - qemu for a PV guest is using one socket connection
> - 14 other files are used by xenstored
> 
> So we should set the limit to 5 * n_dom + 100 (some headroom will be
> nice IMO).

This looks fine to me.

> 
>>
>>>
>>>>
>>>> This would also avoid the problem where Xenstored is not allowed to 
>>>> modify its limit (see more below).
>>>>
>>>>>
>>>>> Signed-off-by: Juergen Gross <jgross@suse.com>
>>>>> ---
>>>>>   tools/xenstore/xenstored_core.c   |  2 ++
>>>>>   tools/xenstore/xenstored_core.h   |  3 ++
>>>>>   tools/xenstore/xenstored_minios.c |  4 +++
>>>>>   tools/xenstore/xenstored_posix.c  | 46 
>>>>> +++++++++++++++++++++++++++++++
>>>>>   4 files changed, 55 insertions(+)
>>>>>
>>>>> diff --git a/tools/xenstore/xenstored_core.c 
>>>>> b/tools/xenstore/xenstored_core.c
>>>>> index b66d119a98..964e693450 100644
>>>>> --- a/tools/xenstore/xenstored_core.c
>>>>> +++ b/tools/xenstore/xenstored_core.c
>>>>> @@ -2243,6 +2243,8 @@ int main(int argc, char *argv[])
>>>>>           xprintf = trace;
>>>>>   #endif
>>>>> +    claim_resources();
>>>>> +
>>>>>       signal(SIGHUP, trigger_reopen_log);
>>>>>       if (tracefile)
>>>>>           tracefile = 
> talloc_strdup(NULL, tracefile);
>>>>> diff --git a/tools/xenstore/xenstored_core.h 
>>>>> b/tools/xenstore/xenstored_core.h
>>>>> index 1467270476..ac26973648 100644
>>>>> --- a/tools/xenstore/xenstored_core.h
>>>>> +++ b/tools/xenstore/xenstored_core.h
>>>>> @@ -255,6 +255,9 @@ void daemonize(void);
>>>>>   /* Close stdin/stdout/stderr to complete daemonize */
>>>>>   void finish_daemonize(void);
>>>>> +/* Set OOM-killer score and raise ulimit. */
>>>>> +void claim_resources(void);
>>>>> +
>>>>>   /* Open a pipe for signal handling */
>>>>>   void init_pipe(int reopen_log_pipe[2]);
>>>>> diff --git a/tools/xenstore/xenstored_minios.c 
>>>>> b/tools/xenstore/xenstored_minios.c
>>>>> index c94493e52a..df8ff580b0 100644
>>>>> --- a/tools/xenstore/xenstored_minios.c
>>>>> +++ b/tools/xenstore/xenstored_minios.c
>>>>> @@ -32,6 +32,10 @@ void finish_daemonize(void)
>>>>>   {
>>>>>   }
>>>>> +void claim_resources(void)
>>>>> +{
>>>>> +}
>>>>> +
>>>>>   void init_pipe(int reopen_log_pipe[2])
>>>>>   {
>>>>>       reopen_log_pipe[0] = -1;
>>>>> diff --git a/tools/xenstore/xenstored_posix.c 
>>>>> b/tools/xenstore/xenstored_posix.c
>>>>> index 48c37ffe3e..0074fbd8b2 100644
>>>>> --- a/tools/xenstore/xenstored_posix.c
>>>>> +++ b/tools/xenstore/xenstored_posix.c
>>>>> @@ -22,6 +22,7 @@
>>>>>   #include <fcntl.h>
>>>>>   #include <stdlib.h>
>>>>>   #include <sys/mman.h>
>>>>> +#include <sys/resource.h>
>>>>>   #include "utils.h"
>>>>>   #include "xenstored_core.h"
>>>>> @@ -87,6 +88,51 @@ void finish_daemonize(void)
>>>>>       close(devnull);
>>>>>   }
>>>>> +static void avoid_oom_killer(void)
>>>>> +{
>>>>> +    char path[32];
>>>>> +    char val[] = "-500";
>>>>> +    int fd;
>>>>> +
>>>>> +    snprintf(path, sizeof(path), "/proc/%d/oom_score_adj", 
>>>>> (int)getpid());
>>>>
>>>> This looks Linux specific. How about other OSes?
>>>
>>> I don't know whether other OSes have an OOM killer, and if they do, how
>>> to configure it. It is a best effort attempt, after all.
>>
>> I have CCed Roger who should be able to help for FreeBSD.
> 
> It would be possible to set the OOM-score from the launch script, too.

It would be ideal if all the limits are set from the launch script. At 
least, they can be changed by the admin and also possibly be enforced 
(if Xenstored is not allowed to do it).

> 
>>
>>>
>>>>
>>>>> +
>>>>> +    fd = open(path, O_WRONLY);
>>>>> +    /* Do nothing if file doesn't exist. */
>>>>
>>>> Your commit message leads to think that we *must* configure the OOM. 
>>>> If not, then we should not continue. But here, this suggest this is 
>>>> optional. In fact...
>>>
>>> I can modify the commit message by adding a "Try to".
>>>
>>>>
>>>>> +    if (fd < 0)
>>>>> +        return;
>>>>> +    /* Ignore errors. */
>>>>> +    write(fd, val, sizeof(val));
>>>>
>>>> ... xenstored may not be allowed to modify its own parameters. So 
>>>> this would continue silently without the admin necessarily knowning 
>>>> the limit wasn't applied.
>>>
>>> I can add a line in the Xenstore log in this regard.
>>
>> This feels wrong to me. If a limit cannot be applied then it should 
>> fail early rather than possibly at the wrong moment a few days 
>> (months?) after.
> 
> I think issuing a warning would be better here. We are running with
> no OOM adjustments since years now.

Right, this is a sign that the OOM adjustment is not necessary in most 
of the cases. But the fact you sent this patch suggests that you or 
someone else saw Xenstored crashing.

The idea with failing hard and early is the admin will directly be aware 
that didn't happen. It can then take action before it is too late (e.g. 
Xenstored was killed while VMs are running).

With the warning, I am worry the admin may not notice it because they 
are easy to miss.

Anyway, if the OOM adjustment is moved to the launch script, then there 
is less change it may fail (the launch script should have higher 
privilege than xenstored).

Cheers,

-- 
Julien Grall