Thomas Lamprecht [Sun, 21 May 2023 10:06:18 +0000 (12:06 +0200)]
bump version to 7.99.0
use a pre-release like version as we got some breaking changes
planned for access control, so might be nice to get (most of) them in
a 8.0.0 for simpler versioned dependencies (>= 8~), but it's also
just a bit of an experiment to see how doing such things plays out,
in the end we can cope with whatever versioning for dependency as bug
fixes might make it necessary to have a more specific version
boundary anyway.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
And using TOTP apps that can scan QR codes is much better UX anyway,
but its to trivial to bother deprecating it now and we'd still depend
on libmime-base32-perl, so really nothing gained in removing or
rewriting in shell..
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Thomas Lamprecht [Sun, 21 May 2023 09:42:21 +0000 (11:42 +0200)]
buildsys: rework clean target, avoid doc-gen one
1. this really doesn't change often
2. the synopsis and opts should be in the owner repo anyway
3. the original one simply deleted all *.adoc files, far to
aggressive
Avoids pve-docs dependency for building the DSC (without having to
pass the ugly no-pre-clean option).
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Previously `authentication_verify` just `die`d on error and
would only return a boolean whether `priv/tfa.cfg` needs
updating as a positive result.
Since we want to support locking TOTP as well as a general
TFA lock-out via the config, we also want to be able to tell
when this occurs. Most of it is handled by the TFA rust
crate already, but notifying users needs to be done on this
end instead.
In pve-rs we now have a different API for this:
`authentication_verify2`, which, instead of die()ing on
errors, always returns a hash containing the result as well
as the flags 'tfa-limit-reached' and 'totp-limit-reached'
which, if set, tell us to notify the user.
However, doing so will introduce new fields in the
`priv/tfa.cfg` in a struct marked as `deny_unknown_fields`,
so in a cluster, the limits & notification handling should
only be done once we can be sure that all nodes are up to
date.
These fields are only introduced on login errors, so for
now, handle a failed result early without saving
`priv/tfa.cfg`.
The only case where saving the file was previously required
was when *successfully* logging in with a recovery key, by
which we cannot be reaching a limit, so this should still be
safe.
Once we can validate that all cluster nodes are up to date,
we can implement the notification system.
A commented-out code structure for this is included in this
patch.
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Dominik Csapak [Thu, 23 Mar 2023 13:14:29 +0000 (14:14 +0100)]
fix #4609: allow valid DN in ldap/ad realm config
We previously added support for ',' in the DNS attribute through
allowing a quoted format, but the regex used was made too
restrictive.
In the new quoted attribute we'd only allow \w (alphanumeric and _)
and the restricted characters. This patch now changes that to allow
everything except the quotation mark " itself, which is again closer
to the original regex which did not care for quotation and allowed
everything aside from ','.
The unquoted attributes did not allow spaces anymore, but the RFC [0]
actually makes it clear that spaces are only forbidden at the
beginning and the end (same for #). So, fix the regex to accommodate
for that and allow space and # characters again if not at the end or
beginning.
0: https://www.ietf.org/rfc/rfc2253.txt
Fixes: 1aa2355 ("ldap: Allow quoted values for DN attribute values") Signed-off-by: Dominik Csapak <d.csapak@proxmox.com> Tested-by: Friedrich Weber <f.weber@proxmox.com>
[ T: make fixes a trailer and rework commit message ] Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Christoph Heiss [Tue, 31 Jan 2023 12:50:42 +0000 (13:50 +0100)]
ldap: Allow quoted values for DN attribute values
This fixes #3748 by allowing reserved characters in `bind_dn` (and
other properties of the same format) if they are properly quoted and
adds some corresponding documentation regarding that.
This was tested by setting up a slapd server and creating a user with
the CN `Test, User` much like in the bug report, then using this user
as `bind_dn` in the sync options. I also tested some variants of that
CN, including just `TestUser`.)
One thing that still won't work is syncing of LDAP users with colons
or slashes in their CNs. In such cases, the message
> value 'Test, User@ldap' does not look like a valid user name
will pop up.
This is due to spaces and colons being explicitly disallowed in
usernames by PVE access-control's username schema. This probably
means that such names can never be allowed, which is being documented
too as separate pve-docs patch.
Note that while this is now a bit more strict for some cases too,
they should not matter in practice. For context; see RFC 2253 [0],
section 4. Interestingly, this document was obsoleted by RFC 4514 [1]
in 2006, which only mentions this in section 2.4 ("Converting an
AttributeValue from ASN.1 to a String") and appendix A ("Presentation
Issues").
But the first one seems to be the "authoritive" document on this
matter, at least looking at some other docs about LDAP DNs (RedHat,
Microsoft, ..).
by switching to a tree-based in-memory structure, like we do in PBS.
instead of parsing ACL entries into a hash using the full ACL path as key for
each entry, parse them into a tree-like nested hash. when evaluating ACLs,
iterating over all path prefixes starting at '/' is needed anyway, so this is a
more natural way to store and access the parsed configuration.
some performance data, timing `pveum user permissions $user > /dev/null` for
various amounts of ACL entries in user.cfg
similarly, an /access/ticket request such as the one happening on login goes
down from 4.27s to 109ms with 2k entries (testing with 20k entries fails
because the request times out after 30s, but with the patch it takes 336ms).
the underlying issue is that these two code paths not only iterate over *all
defined ACL paths* to get a complete picture of a user's/token's privileges,
but the fact that that ACL computation for each checked path itself did another
such loop in PVE::AccessControl::roles().
it is enough to iterate over the to-be-checked ACL path in a component-wise
fashion in order to handle role propagation, e.g., when looking at /a/b/c/d,
iterate over
Dominik Csapak [Fri, 21 Oct 2022 08:31:17 +0000 (10:31 +0200)]
authenticate_user: pass undef instead of empty $tfa_challenge to authenticate_2nd_new
just above, we check & return if $tfa_challenge is set, so there is no
way that it would be set here. To make it clearer that it must be undef
here pass it as such.
Dominik Csapak [Fri, 21 Oct 2022 08:31:15 +0000 (10:31 +0200)]
authenticate_2nd_new: only lock tfa config for recovery keys
we currently only need to lock the tfa config when we got a recovery key
as a tfa challenge response, since that's the only thing that can
actually change the tfa config (every other method only reads from
there).
so to do that, factor out the code that was inside the lock, and call it
with/without lock depending on the tfa challenge response
Stoiko Ivanov [Mon, 29 Aug 2022 16:07:55 +0000 (18:07 +0200)]
auth ldap/ad: compare group member dn case-insensitively
currently we add a user to a group if it's DN is listed in the
member-attributes of a group. The comparison for this is done via
existence check of a hash key, which is case-sensitive.
The equality for DNs is defined in a not straight forward way [0]:
(roughly translating to you need to honor the equality rules for each
'component' (RDN) of the DN) and is implementation-specific (Microsoft
AD is case-insensitive).
While this patch does not address the complete complexity of comparing
DNs it should work fine in practice.
issue with case-sensitive mismatches was reported in our community
forum:
https://forum.proxmox.com/threads/.113387
tested against a local test-vm used for reproducing the issue.
there is a (hard to trigger) race that can cause a double rotation of
the auth key, with potentially confusing fallout (various processes on
different nodes having an inconsistent view of the current and previous
auth keys, resulting in "random" invalid ticket errors until the next
proper key rotation 24h later).
the underlying cause is that `stat()` calls are excempt from our
otherwise non-cached/direct_io handling of pmxcfs I/O, which allows the
following sequence of events to take place:
LAST: mtime of current auth key
- current epoch advances to LAST + 24h
the following can be arbitrarly interleaved between node A and B:
- LAST+24h node A: pvedaemon/pvestatd on node A calls check_authkey(1)
- LAST+24h node A: it returns 0 (rotation required)
- LAST+24h node A: rotate_key() is called
- LAST+24h node A: cfs_lock_authkey is called
- LAST+24h node B: pvedaemon/pvestatd calls check_authkey(1)
- LAST+24h node B: key is not yet cached in-memory by current process
- LAST+24h node B: key file is opened, stat-ed, read, parsed, and content+mtime
is cached (the kernel will now cache this stat result for 1s unless
the path is opened)
- LAST+24h node B: it returns 0 (rotation required)
- LAST+24h node B: rotate_key() is called
- LAST+24h node B: cfs_lock_authkey is called
the following is mutex-ed via a cfs_lock:
- LAST+24h node A: lock is obtained
- LAST+24h node A: check_authkey() is called
- LAST+24h node A: key is stat-ed, mtime is still (correctly) LAST,
cached mtime and content are returned
- LAST+24h node A: it returns 0 (rotation still required)
- LAST+24h node A: get_pubkey() is called and returns current auth key
- LAST+24h node A: new keypair is generated and persisted
- LAST+24h node A: cfs_lock is released
- LAST+24h node B: changes by node A are processed by pmxcfs
- LAST+24h node B: lock is obtained
- LAST+24h node B: check_authkey() is called
- LAST+24h node B: key is stat-ed, mtime is (incorrectly!) still LAST
since the stat call is handled by the kernel/page cache, not by
pmxcfs, cached mtime and content are returned
- LAST+24h node B: it returns 0 (rotation still required)
- LAST+24h node B: get_pubkey() is called and returns either previous or
key written by node A (depending on whether page cache or pmxcfs
answers stat call)
- LAST+24h node B: new keypair is generated, key returned by last
get_pubkey call is written as old key
the end result is that some nodes and process will treat the key
generated by node A as "current", while others will treat the one
generated by nodoe B as "current". both have the same mtime, so the
in-memory cache hash won't be updated unless the service is restarted or
another rotation happens. depending on who generated the ticket and who
attempts validating it, a ticket might be rejected as invalid even
though the generating party would treat it as valid, and time on all
nodes is properly synced.
there seems to be now way for pmxcfs to pro-actively invalidate the page
cache entry safely (since we'd need to do so while writes to the same
path can happen concurrently), so work around by forcing an open/close
at the (stat) call site which does the work for us. regular reads are
not affected since those already bypass the page cache entirely anyway.
thankfully in almost all cases, the following sequence has enough
synchronization overhead baked in to avoid triggering the issue almost
entirely:
- cfs_lock
- generate key
- create tmp file for old key
- write tmp file
- rename tmp file into proper place
- create tmp file for new pub key
- write tmp file
- rename tmp file into proper place
- create tmp file for new priv key
- write tmp file
- rename tmp file into proper place
- release lock
that being said, there has been at least one report where this was
triggered in the wild from time to time.
it is easy to reproduce by increasing the attr_timeout and entry_timeout
fuse settings inside pmxcfs to increase the time stat results are
treated as valid/retained in the page cache:
in which case it's even easy to trigger more than double rotation in a
bigger test cluster (stopping all PVE services except for pve-cluster
helps avoiding interference):
on a single node:
$ touch --date yesterday /etc/pve/authkey.pub
in parallel (i.e., via tmux synchronized panes):
-----8<-----
use strict;
use warnings;
use PVE::Cluster;
use PVE::AccessControl;
PVE::Cluster::cfs_update();
# ensure page cache entry is there
PVE::AccessControl::check_authkey(1);
PVE::AccessControl::check_authkey(1);
# now attempt rotation
PVE::AccessControl::rotate_authkey();
----->8-----
Thanks to Wolfgang Bumiller for assistance in triaging and exploring
various avenues of fixing.
Mira Limbeck [Wed, 15 Jun 2022 14:09:50 +0000 (16:09 +0200)]
fix #4074: increase API OpenID code size limit to 2048
Azure AD seems to have a variable authorization code size, depending on
the browser state according to one report in bug #4074 [0].
Sometimes the size is greater than our current limit of 1024, so
increase it to 2048.
The RFC [1] mentions that there is no limit to the code size, but based on
current experience, a size limit of 2048 might be enough for every
current OpenID Connect provider.
to detect similar issues to that fixed in the previous commit early on
and without the potential for dangerous side-effects.
root@pam is intentionally still allowed before the check in case such
situations can be triggered by misconfiguration - root@pam can then
still clean up the affected configs via the GUI/API, and not just via
manual editing.
the previous version using an ACL path of '/access/users/{userid}' was
broken for non-root users, since the '@' character always contained in a
userid is not allowed in ACL paths.
this effectively meant that creating API tokens only worked for:
- root@pam (ACL checks skipped altogether)
- users with User.Modify on '/' with propagation (the roles/privs for
'/' are propagated to the undefined path in this case)
- users creating their own tokens (first branch of 'or')
the userid-group check is used for all other modifications of user
entities, so it can also be used for creating/modifying/removing API
tokens.
when multiple roles are defined on a path that share a privilege, this
randomly took the propagation flag for the priv from the last role
encountered. since perl hashes are iterated randomly, this means the
propagation flag was sometimes set correctly, and sometimes not.
note that this propagation flag is only used for display/dumping
purposes, and for intersection with token privs (see next commit).
actual handling of propagation happens on the role level in
PVE::AccessControl::roles().
modified test case (spuriously) fails without the fix.
Dominik Csapak [Mon, 28 Mar 2022 12:38:03 +0000 (14:38 +0200)]
fix #3668: realm-sync: replace 'full' & 'purge' with 'remove-vanished'
Both, 'full' & 'purge', configure what is removed when an entry
(or property) is not returned anymore from the sync result. For users
it may be thus easier to understand setting a single property without
wondering about the interaction of the two specific ones.
The new 'remove-vanished' is a list of things to remove, namely:
* 'acl'
* 'entry'
* 'properties'.
The old 'full' gets mapped to 'entry;properties' and an old 'purge'
to the 'acl'
Keep the old properties for backwards-compatibility but transform to
the new list on edit. Add a deprecation notice to the description
with a TODO reminder to remove with 8.0
Note, this commit changes two things slightly:
* 'purge' (now remove-vanished -> acl) does now remove ACLs for
vanished entries even when we keep those entries. Previously those
were only deleted when we also removed the entries (this can be
seen by the changed regression test)
* since 'remove-vanished' is empty by default, only the scope must be
given explicitly on sync or the sync-default. Previously 'full' and
'purge' must have been configured explicitly (no big deal, since
the default is to not remove anything)
Bug #3668 is fixed when not selecting 'properties' for the sync, so
that existing (custom) properties are not deleted on update
Suggested-by: Thomas Lamprecht <t.lamprecht@proxmox.com> Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
[ T: some commit message typos/wording ] Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
unchanged: the user 'test@pve' can allocate new '@pve' users, but
only if the created user will belong to at least one of 'test'
(direct ACL for that user) or 'test3' (indirect ACL via 'test' group)
groups.
changed: if the user 'test@pve' updates an existing user, they need
to (A) have 'User.Modify' on at least one existing group of that
user, and (B) 'User.Modify' on all of the groups passed in via the
'groups' parameter. A is the general rule for 'allowed to modify
user' across the board, but was missing for this specific variant of
the check. B was the case before, but just checking this without also
checking A allows a user to pull off-limits users into groups that
they can modify, which then in turn allows them to modify those users
via A which is now passing.
for example, without this patch 'test@pve' would be able to add
'other@pve' to either 'test' or 'test3', and then in turn call any of
the API endpoints that require 'User.Modify' on a user's group
(change TFA, change password or delete user if realm is pve, ..).
all the other userid-group checks without group_param set remain
unchanged as well, since $check_existing_user is true in that case.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com> Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
rpcenv: skip undef propagation flag in more places
these are just cosmetic fixes/safeguards against future bugs -
compute_api_permissions is used to set the 'cap' object to hide parts of
the GUI that are not usable without the corresponding privs in the
backend anyway, and get_effective_permissions is only used to return the
permission tree without a specific path query.
this could return undef for the propagation flag instead of 1/0, leading
to confusing displays of permission trees. all the actual checks using
the returned hash check for definedness anyway, so the actual
privileges checked and the displayed ones were not identical.
The userid-* permission check variants work on
$param->{userid} directly which does not exist for this
call. Also, they work on the realm of the user being
checked, rather than the realm provided as parameter.
The result was that as non-root user this always failed
with the message "userid '' too short"
Fix this by making the check explicitly work like in the
description.
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com> Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
instead of restricting listing tfa entries of others to
root@pam, perform the same checks the user-list does and
which also reflect the permissions of the api calls actually
operating on those users, so, `User.Modify` on the user (but
also `Sys.Audit`, since it's only a read-operation, just
like the user index API call)
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>