Thomas Lamprecht [Fri, 15 Dec 2017 16:00:31 +0000 (17:00 +0100)]
fork_worker: factor out synced worker output mirroring
When running in sync (CLI environment) we mirror the workers output
to both, STDOUT and th task log file, a similar function as the unix
comand line tool tee provides, thus we borrow its name for the
factored out sub method.
This moves ~60 lines of code out of the big fork_worker sub and makes
it easier to read track what happens there.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Thomas Lamprecht [Fri, 15 Dec 2017 16:00:30 +0000 (17:00 +0100)]
fork_worker: use separate pipe for status messages
We forced line wise flushing of the workers STDOUT and STDERR to
capture the final status (TASK OK/TASK ERROR).
Thus, if the code executed in the worker wanted to flush explicitly,
e.g., when the last output wasn't new line terminated but needed to
reach the users eyes, the parent just ignored that.
This leads to confusing results in CLI handlers using fork_workers.
So remove the buffering logic completely and introduce a separate
pipe for sending the final status.
Said pipe gets once read after the child closes (EOF) its STDOUT.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Thomas Lamprecht [Fri, 15 Dec 2017 16:00:29 +0000 (17:00 +0100)]
fork_worker: refactor passing $upid to parent for sync
STDOUT and $psync[1] are the same here, so no need to differ.
Also we do this only for letting the parent know tha we're ready, the
parent knows the UPID already as it was generated before forking.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Thomas Lamprecht [Fri, 15 Dec 2017 05:41:49 +0000 (06:41 +0100)]
ticket: raise UNAUTHORIZED not FORBIDDEN in verify subs
In the ticket and CSRF prevention token verification methods we used
a raise_perm exception to tell our caller about a failure of such a
verification. raise_perm uses HTTP_FORBIDDEN (403) as code.
Earlier, all such exceptions or die's where caught when the anyevent
http server called the auth_handler method and transformed to
HTTP_UNAUTHORIZED (401).
With commit d8327719e353198a1dffad88c246fee065054a6b from
pve-http-server we gained the ability to tell a client about a server
internal 5XX error, so that clients do not get wrongly logged out if
we have a internal error.
This resulted also in the effect that the exceptions of the
verify_rsa_ticket and verify_csrf_prevention_token sub methods where
passed to the client.
If an old, now invalid, ticket was sent to the server a client got
403 (FORBIDDEN) instead of the 401 (UNAUTHORIZED) - which he was used
to, and thus meant that he did some wrong doing, instead of knowing
that he just needs to login.
As we are not yet logged in here, and thus cannot possibly know if
the call is forbidden or not, HTTP_FORBIDDEN seems the wrong code.
Change it to HTTP_UNAUTHORIZED, which restores it to the code we told
API clients since ever and is the correct one here.
Also RFC 2068 section 10.4.4 [1] defines that for the afformentioned
verify methods FORBIDDEN was not really correct:
> 403 Forbidden
>
> The server understood the request, but is refusing to fulfill it.
> Authorization will not help and the request SHOULD NOT be
> repeated. [...]
With a invalid ticket or CSRF prevention token we have a
authorization problem for the current call, not a permission problem
(we may have, but we can't tell yet).
* Cancel on Ctrl+C (die())
* Finish on Ctrl+D (eof/eot) without appending a newline
* Also finish on \n to be sure.
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com> Reviewed-by: Thomas Lamprecht <t.lamprecht@proxmox.com> Tested-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Most times a port was requested for a specified IP family (v4, v6)
only. Thus also ensure that the port from the respective family got
ready, else we may return on a false positive.
As we had no user setting the $timeout param we can add the $family
param as second one, it'll get used more often, so no need to put it
at the back.
As we do nothing if not defined this does not changes the behavior of
our users yet.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Thomas Lamprecht [Fri, 10 Nov 2017 11:09:27 +0000 (12:09 +0100)]
daemon: don't send SIGTERM before restart on leave_children_open_on_reload
Else this options is not really useful. First, sending a SIGTERM lets
the children exit, not quite what "leave_children_open_on_reload"
promises.
The problem this causes is that we may get a time window where no
worker is active and thus, for example, our API daemon would not
accept connections during a restart (or better said, reload).
So, don't request termination of any child worker, if this option is
set, but rather just restart (re-exec) ourself, startup a new set of
workers and only then request the termination of the old ones,
allowing a fully seamless reload.
This is only done on `$daemon-exe restart` and thus on
`systemctl reload $daemon`, systemctl restart or any other stop start
cycles always exit all other workers first.
This expects that the worker can do a graceful termination on
SIGTERM, which is already the case for anything using our AnyEvent
based class (which is base of our HTTPServer module).
With graceful termination is meant the following: the worker accepts
no new work and exits immediately after the current queued work is
done.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Thomas Lamprecht [Fri, 10 Nov 2017 09:24:25 +0000 (10:24 +0100)]
lock_file_full: add missing trailing newline
When we do not instantly get the lock we print a respective message
to stderr. This shows also up in the task logs, and if it's the last
message before a 'Task OK' the UI gets confused an shows the task as
erroneous.
Keep the message as its a good feedback for the user to see why an op
seems to do nothing, so simply add a trailing newline.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Use double hyphens when prefixing command options in the documentation
This makes our man pages follow the GNU long option recommandations
where non-single character options are prefixed with a double hyphen
(https://www.gnu.org/software/libc/manual/html_node/Argument-Syntax.html)
The benefit for PVE is that our documentation looks more similar to what
a user with previous Linux knowledge is used to.
Our bash autocompletion helper only completes options using double hyphens too.
Thomas Lamprecht [Mon, 11 Sep 2017 08:41:34 +0000 (10:41 +0200)]
Tools: add `convert_size` for generic byte conversion
We often need to convert between file sizes, for formatting output,
but also code-internal. Some methods expect kilobytes, some gigabytes
and sometimes we need bytes.
While conversion from smaller to bigger units can be simply done with
a left-shift, the opposite conversion may need more attention -
depending on the used context.
If we allocate disks this is quite critical. For example, if we need
to allocate a disk with size 1023 bytes using the
PVE::Storage::vdisk_alloc method (which expects kilobytes) a
right shift by 10 (<=> division by 1024) would result in "0", which
obviously fails.
Thus we round up the converted value if a remainder was lost on the
transformation in this new method. This behaviour is opt-out, to be
on the safe side.
The method can be used in a clear way, as it gives information about
the source and target unit size, unlike "$var *= 1024", which doesn't
gives direct information at all, if not commented or derived
somewhere from its context.
For example:
> my $size = convert_unit($value, 'gb' => 'kb');
is more clear than:
> my $size = $value*1024*1024;
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
run_fork_with_timeout: do not overwrite global signal handlers
perls 'local' must be either used in front of each $SIG{...}
assignments or they must be put in a list, else it affects only the
first variable and the rest are *not* in local context.
This may cause weird behaviour where daemons seemingly do not get
terminating signals delivered correctly and thus may not shutdown
gracefully anymore.
As we only send SIGINT to processes if a manual stop action gets
triggered just catch this one here.
As this is a general method which allows to pass an arbitrary code
payload we cannot sanely handle all signals here, so remove trapping
all other besides SIGINT, if those need to be trapped that should be
done by the caller on a case by case basis.
Fixes: #1495 Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Don't die because the tasklist could not be broadcasted, just log the
error.
Else we may hinder all task to run with a quite confusing error (i.e.
"ipcc_send_rec: file to big").
This may happen if there are a lot currently running tasks at once.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Thomas Lamprecht [Tue, 27 Jun 2017 09:12:04 +0000 (11:12 +0200)]
use more reliable checks in wait_for_vnc_port
We run into problems where this method returned to early, even if the
port wasn't actually ready yet. The reason for this is that we
checked /proc/net/tcp which does not guarantees and always up to date
state of only those ports which are actuall available, i.e. a port
could linger around (time-wait state) or appear even if it wasn't
accepting connections yet (as stated in the kernel docs:
/proc/net/tcp is seen as obsolete by the kernel devs).
Use the `ss` tool from the iproute2 package, it uses netlink to get
the current state and has switches where we can direct it to really
only get the state of those sockets which interest us currently.
I.e., we tell it to get only listening TCP sockets from the requested
port.
The only drawback is that we loop on a run_command, which is slower
than just reading a file. A single loop needs about 1ms here vs the
60µs on the /proc/net/tcp read. But this isn't a api call which is
used highly frequently but rather once per noVNC console open.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Dominik Csapak [Tue, 13 Jun 2017 09:25:33 +0000 (11:25 +0200)]
trim event and check if empty
give a meaningful error if it is empty and disallow it instead of having
an implicit default (the default should be set by the component using
the calendarevent, not the calendarevent itself)
tools: next_unused_port: use IPPROTO_TCP explicitly
Otherwise perl tries to bind+listen on a UDP socket if the
TCP socket fails - which is a waste since we're looking for
TCP ports.
Additionall since UDP doesn't support listen(), perl will
return EOPNOTSUPP instead of, say, EADDRINUSE. (We don't
care about the error in this code though.)
While it should be impossible to bind to a wildcard address
when the port is in use by any other address there's one
case where this is allowed, and that's when the port is in
use by an ipv6 address while trying to bind to an ipv4
wildcard.
This currently happens when qemu finds ::1 for the
'localhost' we pass to qemu's spice address while we're
resolving the local nodename via IPv4.
Thomas Lamprecht [Wed, 10 May 2017 13:03:45 +0000 (15:03 +0200)]
swap raw syscall numbers with syscall.ph for easier porting
Raw syscall numbers were not platform independent, so replace them
with the helpers provided from the syscall.ph perl bits helper.
This makes reading the code easier as a nice side effect.
As syscall.ph is not an ordinary module and makes problems when it is
required by multiple modules we make a own module PVE::Syscall which
loads it and allows to export the necessary constants in a sane way.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Tools: make file-locking aware of external exception sources
Previously an external exception (eg. caused by a SIGARLM in a code
which is already inside a run_with_timeout() call) could happen in
various places where we did not properly this situation.
For instance after calling $lock_func() but before reaching the cleanup
code. In this case a lock was leaked.
Additionally the code was broken in that it used perl's automatic hash
creation side effect ($a->{x}->{y} implicitly initializing $a->{x} with
an empty hash when it did not exist). The effect was that if our own
time out was triggered after the initial check for an existing file
handle inside $lock_func() happened (extremely rare since perl would have
to be running insanely slow), the cleanup did:
if (my $fh = $lock_handles->{$$}->{$filename}->{fh}) {
This recreated $lock_handles->{$$}->{$filename} as an empty hash.
A subsequent call to lock_file_full() will think a file descriptor
already exists because the check simply used:
if (!$lock_handles->{$$}->{$filename}) {
While this could have been a one-line fix for this one particular case,
we'd still not be taking external timeouts into account causing the
first issue described above.
get_options: handle array and scalar refs on decoding
get_options is for parsing CLI options, here we decode after using
Getopt as we are not sure how well it handles already decoded data.
But as Gettopt can produces references for the parsed data we must
handle them explictily.
So check if we have a ARRAY or SCALAR reference and decode them
respectively.
All other reference types should not get returned from Getopt so
error out on them.
This bug was seen when viewing backup jobs, as we save the job as a
comand entry in /etc/pve/vzdump.cron and parse it then with this
function on reading.
Besides the use there we use it in the RESTHandler Packages
cli_handler sub method, so some CLI tools could be possibly affected
by this.
Fixes: 24197a9f6c698985b7255fbf7792b0b6bd8188b5 Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Add addr_to_ip and get_ip_from_hostname helpers to PVE::Network
The first helper, addr_to_ip, is based on Wolfgangs version of this
[0]
I just moved it from PVE::Tools to PVE::Network, as it seems a more
fitting place.
It uses getnameinfo to extract information from the paddr parameter,
which is sockaddr struct
It gets used in the second helper and in a bug fix series from
Wolfgang [1]
The second helper, get_ip_from_hostname, resolves an hostname to an
IP and checks if it isn't one from the for loopback reserved 127/8
subnet. It will be used in get_remote_nodeip from PVE::CLuster and
for a bugfix in pvecm.
CpuSets usually come from (or a built using) values read
from cgroups anyway. (Eg. for container balancing we only
use ids found in lxc/cpuset.effective_cpus.)
When the child process running the command got an signal or failed
to execute exitcode was still undefined as we extract it just only
after the signal/failed to execute check.
This led to:
> Use of uninitialized value in numeric ne (!=) at
> /usr/share/perl5/PVE/API2/Qemu.pm line 1433.
errors if we used run_commands `noerr` param and checked for the
commands exit code.
So just default the exit code to -1 for such cases.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>