git.proxmox.com Git - mirror_ubuntu-jammy-kernel.git/log

crypto: user - remove intermediate variable

The use of the v64 intermediate variable is useless, and removing it
bring to much readable code.

Signed-off-by: Corentin Labbe <clabbe@baylibre.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: user - Fix invalid stat reporting

Some error count use the wrong name for getting this data.
But this had not caused any reporting problem, since all error count are shared in the same
union.

Signed-off-by: Corentin Labbe <clabbe@baylibre.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: user - fix use_after_free of struct xxx_request

All crypto_stats functions use the struct xxx_request for feeding stats,
but in some case this structure could already be freed.

For fixing this, the needed parameters (len and alg) will be stored
before the request being executed.
Fixes: cac5818c25d0 ("crypto: user - Implement a generic crypto statistics")
Reported-by: syzbot <syzbot+6939a606a5305e9e9799@syzkaller.appspotmail.com>
Signed-off-by: Corentin Labbe <clabbe@baylibre.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: tool: getstat: convert user space example to the new crypto_user_stat uapi

This patch converts the getstat example tool to the recent changes done in crypto_user_stat
- changed all stats to u64
- separated struct stats for each crypto alg

Signed-off-by: Corentin Labbe <clabbe@baylibre.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: user - split user space crypto stat structures

It is cleaner to have each stat in their own structures.

Signed-off-by: Corentin Labbe <clabbe@baylibre.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: user - convert all stats from u32 to u64

All the 32-bit fields need to be 64-bit. In some cases, UINT32_MAX crypto
operations can be done in seconds.

Reported-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Corentin Labbe <clabbe@baylibre.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: user - CRYPTO_STATS should depend on CRYPTO_USER

CRYPTO_STATS is using CRYPTO_USER stuff, so it should depends on it.
Signed-off-by: Corentin Labbe <clabbe@baylibre.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: user - made crypto_user_stat optional

Even if CRYPTO_STATS is set to n, some part of CRYPTO_STATS are
compiled.
This patch made all part of crypto_user_stat uncompiled in that case.

Signed-off-by: Corentin Labbe <clabbe@baylibre.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

MAINTAINERS: change NX/VMX maintainers

Add Breno and Nayna as NX/VMX crypto driver maintainers. Also change my
email address to my personal account and remove Leonidas since he's not
working with the driver anymore.

Signed-off-by: Paulo Flabiano Smorigo <pfsmorigo@linux.vnet.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

MAINTAINERS: ccree: add co-maintainer

Add Yael Chemla as co-maintainer of Arm TrustZone CryptoCell REE
driver.

Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

dt-bindings: crypto: ccree: add dt bindings for ccree 703

Add device tree bindings associating Arm TrustZone CryptoCell 703 with the
ccree driver.

Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: ccree - add support for CryptoCell 703

Add support for Arm TrustZone CryptoCell 703.
The 703 is a variant of the CryptoCell 713 that supports only
algorithms certified by the Chinesse Office of the State Commercial
Cryptography Administration (OSCCA).

Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

Merge crypto tree to pick up crypto stats API revert.

crypto: user - Disable statistics interface

Since this user-space API is still undergoing significant changes,
this patch disables it for the current merge window.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: cavium/nitrox - Enable interrups for PF in SR-IOV mode.

Enable the available interrupt vectors for PF in SR-IOV Mode.
Only single vector entry 192 is valid of PF. This is used to
notify any hardware errors and mailbox messages from VF(s).

Signed-off-by: Srikanth Jampala <Jampala.Srikanth@cavium.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: cavium/nitrox - crypto request format changes

nitrox_skcipher_crypt() will do the necessary formatting/ordering of
input and output sglists based on the algorithm requirements.
It will also accommodate the mandatory output buffers required for
NITROX hardware like Output request headers (ORH) and Completion headers.

Signed-off-by: Nagadheeraj Rottela <rottela.nagadheeraj@cavium.com>
Reviewed-by: Srikanth Jampala <Jampala.Srikanth@cavium.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: x86/chacha20 - Add a 4-block AVX-512VL variant

This version uses the same principle as the AVX2 version by scheduling the
operations for two block pairs in parallel. It benefits from the AVX-512VL
rotate instructions and the more efficient partial block handling using
"vmovdqu8", resulting in a speedup of the raw block function of ~20%.

Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: x86/chacha20 - Add a 2-block AVX-512VL variant

This version uses the same principle as the AVX2 version. It benefits
from the AVX-512VL rotate instructions and the more efficient partial
block handling using "vmovdqu8", resulting in a speedup of ~20%.

Unlike the AVX2 version, it is faster than the single block SSSE3 version
to process a single block. Hence we engage that function for (partial)
single block lengths as well.

Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: x86/chacha20 - Add a 8-block AVX-512VL variant

This variant is similar to the AVX2 version, but benefits from the AVX-512
rotate instructions and the additional registers, so it can operate without
any data on the stack. It uses ymm registers only to avoid the massive core
throttling on Skylake-X platforms. Nontheless does it bring a ~30% speed
improvement compared to the AVX2 variant for random encryption lengths.

The AVX2 version uses "rep movsb" for partial block XORing via the stack.
With AVX-512, the new "vmovdqu8" can do this much more efficiently. The
associated "kmov" instructions to work with dynamic masks is not part of
the AVX-512VL instruction set, hence we depend on AVX-512BW as well. Given
that the major AVX-512VL architectures provide AVX-512BW and this extension
does not affect core clocking, this seems to be no problem at least for
now.

Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: do not free algorithm before using

In multiple functions, the algorithm fields are read after its reference
is dropped through crypto_mod_put. In this case, the algorithm memory
may be freed, resulting in use-after-free bugs. This patch delays the
put operation until the algorithm is never used.

Fixes: 79c65d179a40 ("crypto: cbc - Convert to skcipher")
Fixes: a7d85e06ed80 ("crypto: cfb - add support for Cipher FeedBack mode")
Fixes: 043a44001b9e ("crypto: pcbc - Convert to skcipher")
Cc: <stable@vger.kernel.org>
Signed-off-by: Pan Bian <bianpan2016@163.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: adiantum - add Adiantum support

Add support for the Adiantum encryption mode.  Adiantum was designed by
Paul Crowley and is specified by our paper:

    Adiantum: length-preserving encryption for entry-level processors
    (https://eprint.iacr.org/2018/720.pdf)

See our paper for full details; this patch only provides an overview.

Adiantum is a tweakable, length-preserving encryption mode designed for
fast and secure disk encryption, especially on CPUs without dedicated
crypto instructions.  Adiantum encrypts each sector using the XChaCha12
stream cipher, two passes of an ε-almost-∆-universal (εA∆U) hash
function, and an invocation of the AES-256 block cipher on a single
16-byte block.  On CPUs without AES instructions, Adiantum is much
faster than AES-XTS; for example, on ARM Cortex-A7, on 4096-byte sectors
Adiantum encryption is about 4 times faster than AES-256-XTS encryption,
and decryption about 5 times faster.

Adiantum is a specialization of the more general HBSH construction.  Our
earlier proposal, HPolyC, was also a HBSH specialization, but it used a
different εA∆U hash function, one based on Poly1305 only.  Adiantum's
εA∆U hash function, which is based primarily on the "NH" hash function
like that used in UMAC (RFC4418), is about twice as fast as HPolyC's;
consequently, Adiantum is about 20% faster than HPolyC.

This speed comes with no loss of security: Adiantum is provably just as
secure as HPolyC, in fact slightly *more* secure.  Like HPolyC,
Adiantum's security is reducible to that of XChaCha12 and AES-256,
subject to a security bound.  XChaCha12 itself has a security reduction
to ChaCha12.  Therefore, one need not "trust" Adiantum; one need only
trust ChaCha12 and AES-256.  Note that the εA∆U hash function is only
used for its proven combinatorical properties so cannot be "broken".

Adiantum is also a true wide-block encryption mode, so flipping any
plaintext bit in the sector scrambles the entire ciphertext, and vice
versa.  No other such mode is available in the kernel currently; doing
the same with XTS scrambles only 16 bytes.  Adiantum also supports
arbitrary-length tweaks and naturally supports any length input >= 16
bytes without needing "ciphertext stealing".

For the stream cipher, Adiantum uses XChaCha12 rather than XChaCha20 in
order to make encryption feasible on the widest range of devices.
Although the 20-round variant is quite popular, the best known attacks
on ChaCha are on only 7 rounds, so ChaCha12 still has a substantial
security margin; in fact, larger than AES-256's.  12-round Salsa20 is
also the eSTREAM recommendation.  For the block cipher, Adiantum uses
AES-256, despite it having a lower security margin than XChaCha12 and
needing table lookups, due to AES's extensive adoption and analysis
making it the obvious first choice.  Nevertheless, for flexibility this
patch also permits the "adiantum" template to be instantiated with
XChaCha20 and/or with an alternate block cipher.

We need Adiantum support in the kernel for use in dm-crypt and fscrypt,
where currently the only other suitable options are block cipher modes
such as AES-XTS.  A big problem with this is that many low-end mobile
devices (e.g. Android Go phones sold primarily in developing countries,
as well as some smartwatches) still have CPUs that lack AES
instructions, e.g. ARM Cortex-A7.  Sadly, AES-XTS encryption is much too
slow to be viable on these devices.  We did find that some "lightweight"
block ciphers are fast enough, but these suffer from problems such as
not having much cryptanalysis or being too controversial.

The ChaCha stream cipher has excellent performance but is insecure to
use directly for disk encryption, since each sector's IV is reused each
time it is overwritten.  Even restricting the threat model to offline
attacks only isn't enough, since modern flash storage devices don't
guarantee that "overwrites" are really overwrites, due to wear-leveling.
Adiantum avoids this problem by constructing a
"tweakable super-pseudorandom permutation"; this is the strongest
possible security model for length-preserving encryption.

Of course, storing random nonces along with the ciphertext would be the
ideal solution.  But doing that with existing hardware and filesystems
runs into major practical problems; in most cases it would require data
journaling (like dm-integrity) which severely degrades performance.
Thus, for now length-preserving encryption is still needed.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: arm/nhpoly1305 - add NEON-accelerated NHPoly1305

Add an ARM NEON implementation of NHPoly1305, an ε-almost-∆-universal
hash function used in the Adiantum encryption mode. For now, only the
NH portion is actually NEON-accelerated; the Poly1305 part is less
performance-critical so is just implemented in C.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: nhpoly1305 - add NHPoly1305 support

Add a generic implementation of NHPoly1305, an ε-almost-∆-universal hash
function used in the Adiantum encryption mode.

CONFIG_NHPOLY1305 is not selectable by itself since there won't be any
real reason to enable it without also enabling Adiantum support.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: poly1305 - add Poly1305 core API

Expose a low-level Poly1305 API which implements the
ε-almost-∆-universal (εA∆U) hash function underlying the Poly1305 MAC
and supports block-aligned inputs only.

This is needed for Adiantum hashing, which builds an εA∆U hash function
from NH and a polynomial evaluation in GF(2^{130}-5); this polynomial
evaluation is identical to the one the Poly1305 MAC does.  However, the
crypto_shash Poly1305 API isn't very appropriate for this because its
calling convention assumes it is used as a MAC, with a 32-byte "one-time
key" provided for every digest.

But by design, in Adiantum hashing the performance of the polynomial
evaluation isn't nearly as critical as NH.  So it suffices to just have
some C helper functions.  Thus, this patch adds such functions.

Acked-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: poly1305 - use structures for key and accumulator

In preparation for exposing a low-level Poly1305 API which implements
the ε-almost-∆-universal (εA∆U) hash function underlying the Poly1305
MAC and supports block-aligned inputs only, create structures
poly1305_key and poly1305_state which hold the limbs of the Poly1305
"r" key and accumulator, respectively.

These structures could actually have the same type (e.g. poly1305_val),
but different types are preferable, to prevent misuse.

Acked-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: arm/chacha - add XChaCha12 support

Now that the 32-bit ARM NEON implementation of ChaCha20 and XChaCha20
has been refactored to support varying the number of rounds, add support
for XChaCha12. This is identical to XChaCha20 except for the number of
rounds, which is 12 instead of 20.

XChaCha12 is faster than XChaCha20 but has a lower security margin,
though still greater than AES-256's since the best known attacks make it
through only 7 rounds. See the patch "crypto: chacha - add XChaCha12
support" for more details about why we need XChaCha12 support.

Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: arm/chacha20 - refactor to allow varying number of rounds

In preparation for adding XChaCha12 support, rename/refactor the NEON
implementation of ChaCha20 to support different numbers of rounds.

Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: arm/chacha20 - add XChaCha20 support

Add an XChaCha20 implementation that is hooked up to the ARM NEON
implementation of ChaCha20. This is needed for use in the Adiantum
encryption mode; see the generic code patch,
"crypto: chacha20-generic - add XChaCha20 support", for more details.

We also update the NEON code to support HChaCha20 on one block, so we
can use that in XChaCha20 rather than calling the generic HChaCha20.
This required factoring the permutation out into its own macro.

Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: arm/chacha20 - limit the preemption-disabled section

To improve responsivesess, disable preemption for each step of the walk
(which is at most PAGE_SIZE) rather than for the entire
encryption/decryption operation.

Suggested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: chacha - add XChaCha12 support

Now that the generic implementation of ChaCha20 has been refactored to
allow varying the number of rounds, add support for XChaCha12, which is
the XSalsa construction applied to ChaCha12.  ChaCha12 is one of the
three ciphers specified by the original ChaCha paper
(https://cr.yp.to/chacha/chacha-20080128.pdf: "ChaCha, a variant of
Salsa20"), alongside ChaCha8 and ChaCha20.  ChaCha12 is faster than
ChaCha20 but has a lower, but still large, security margin.

We need XChaCha12 support so that it can be used in the Adiantum
encryption mode, which enables disk/file encryption on low-end mobile
devices where AES-XTS is too slow as the CPUs lack AES instructions.

We'd prefer XChaCha20 (the more popular variant), but it's too slow on
some of our target devices, so at least in some cases we do need the
XChaCha12-based version.  In more detail, the problem is that Adiantum
is still much slower than we're happy with, and encryption still has a
quite noticeable effect on the feel of low-end devices.  Users and
vendors push back hard against encryption that degrades the user
experience, which always risks encryption being disabled entirely.  So
we need to choose the fastest option that gives us a solid margin of
security, and here that's XChaCha12.  The best known attack on ChaCha
breaks only 7 rounds and has 2^235 time complexity, so ChaCha12's
security margin is still better than AES-256's.  Much has been learned
about cryptanalysis of ARX ciphers since Salsa20 was originally designed
in 2005, and it now seems we can be comfortable with a smaller number of
rounds.  The eSTREAM project also suggests the 12-round version of
Salsa20 as providing the best balance among the different variants:
combining very good performance with a "comfortable margin of security".

Note that it would be trivial to add vanilla ChaCha12 in addition to
XChaCha12.  However, it's unneeded for now and therefore is omitted.

As discussed in the patch that introduced XChaCha20 support, I
considered splitting the code into separate chacha-common, chacha20,
xchacha20, and xchacha12 modules, so that these algorithms could be
enabled/disabled independently.  However, since nearly all the code is
shared anyway, I ultimately decided there would have been little benefit
to the added complexity.

Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: chacha20-generic - refactor to allow varying number of rounds

In preparation for adding XChaCha12 support, rename/refactor
chacha20-generic to support different numbers of rounds.  The
justification for needing XChaCha12 support is explained in more detail
in the patch "crypto: chacha - add XChaCha12 support".

The only difference between ChaCha{8,12,20} are the number of rounds
itself; all other parts of the algorithm are the same.  Therefore,
remove the "20" from all definitions, structures, functions, files, etc.
that will be shared by all ChaCha versions.

Also make ->setkey() store the round count in the chacha_ctx (previously
chacha20_ctx).  The generic code then passes the round count through to
chacha_block().  There will be a ->setkey() function for each explicitly
allowed round count; the encrypt/decrypt functions will be the same.  I
decided not to do it the opposite way (same ->setkey() function for all
round counts, with different encrypt/decrypt functions) because that
would have required more boilerplate code in architecture-specific
implementations of ChaCha and XChaCha.

Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: chacha20-generic - add XChaCha20 support

Add support for the XChaCha20 stream cipher.  XChaCha20 is the
application of the XSalsa20 construction
(https://cr.yp.to/snuffle/xsalsa-20081128.pdf) to ChaCha20 rather than
to Salsa20.  XChaCha20 extends ChaCha20's nonce length from 64 bits (or
96 bits, depending on convention) to 192 bits, while provably retaining
ChaCha20's security.  XChaCha20 uses the ChaCha20 permutation to map the
key and first 128 nonce bits to a 256-bit subkey.  Then, it does the
ChaCha20 stream cipher with the subkey and remaining 64 bits of nonce.

We need XChaCha support in order to add support for the Adiantum
encryption mode.  Note that to meet our performance requirements, we
actually plan to primarily use the variant XChaCha12.  But we believe
it's wise to first add XChaCha20 as a baseline with a higher security
margin, in case there are any situations where it can be used.
Supporting both variants is straightforward.

Since XChaCha20's subkey differs for each request, XChaCha20 can't be a
template that wraps ChaCha20; that would require re-keying the
underlying ChaCha20 for every request, which wouldn't be thread-safe.
Instead, we make XChaCha20 its own top-level algorithm which calls the
ChaCha20 streaming implementation internally.

Similar to the existing ChaCha20 implementation, we define the IV to be
the nonce and stream position concatenated together.  This allows users
to seek to any position in the stream.

I considered splitting the code into separate chacha20-common, chacha20,
and xchacha20 modules, so that chacha20 and xchacha20 could be
enabled/disabled independently.  However, since nearly all the code is
shared anyway, I ultimately decided there would have been little benefit
to the added complexity of separate modules.

Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: chacha20-generic - don't unnecessarily use atomic walk

chacha20-generic doesn't use SIMD instructions or otherwise disable
preemption, so passing atomic=true to skcipher_walk_virt() is
unnecessary.

Suggested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: chacha20-generic - add HChaCha20 library function

Refactor the unkeyed permutation part of chacha20_block() into its own
function, then add hchacha20_block() which is the ChaCha equivalent of
HSalsa20 and is an intermediate step towards XChaCha20 (see
https://cr.yp.to/snuffle/xsalsa-20081128.pdf). HChaCha20 skips the
final addition of the initial state, and outputs only certain words of
the state. It should not be used for streaming directly.

Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: drop mask=CRYPTO_ALG_ASYNC from 'shash' tfm allocations

'shash' algorithms are always synchronous, so passing CRYPTO_ALG_ASYNC
in the mask to crypto_alloc_shash() has no effect. Many users therefore
already don't pass it, but some still do. This inconsistency can cause
confusion, especially since the way the 'mask' argument works is
somewhat counterintuitive.

Thus, just remove the unneeded CRYPTO_ALG_ASYNC flags.

This patch shouldn't change any actual behavior.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: drop mask=CRYPTO_ALG_ASYNC from 'cipher' tfm allocations

'cipher' algorithms (single block ciphers) are always synchronous, so
passing CRYPTO_ALG_ASYNC in the mask to crypto_alloc_cipher() has no
effect. Many users therefore already don't pass it, but some still do.
This inconsistency can cause confusion, especially since the way the
'mask' argument works is somewhat counterintuitive.

Thus, just remove the unneeded CRYPTO_ALG_ASYNC flags.

This patch shouldn't change any actual behavior.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: remove useless initializations of cra_list

Some algorithms initialize their .cra_list prior to registration.
But this is unnecessary since crypto_register_alg() will overwrite
.cra_list when adding the algorithm to the 'crypto_alg_list'.
Apparently the useless assignment has just been copy+pasted around.

So, remove the useless assignments.

Exception: paes_s390.c uses cra_list to check whether the algorithm is
registered or not, so I left that as-is for now.

This patch shouldn't change any actual behavior.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: inside-secure - remove useless setting of type flags

Remove the unnecessary setting of CRYPTO_ALG_TYPE_SKCIPHER.
Commit 2c95e6d97892 ("crypto: skcipher - remove useless setting of type
flags") took care of this everywhere else, but a few more instances made
it into the tree at about the same time. Squash them before they get
copy+pasted around again.

This patch shouldn't change any actual behavior.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: ecc - regularize scalar for scalar multiplication

ecc_point_mult is supposed to be used with a regularized scalar,
otherwise, it's possible to deduce the position of the top bit of the
scalar with timing attack. This is important when the scalar is a
private key.

ecc_point_mult is already using a regular algorithm (i.e. having an
operation flow independent of the input scalar) but regularization step
is not implemented.

Arrange scalar to always have fixed top bit by adding a multiple of the
curve order (n).

References:
The constant time regularization step is based on micro-ecc by Kenneth
MacKay and also referenced in the literature (Bernstein, D. J., & Lange,
T. (2017). Montgomery curves and the Montgomery ladder. (Cryptology
ePrint Archive; Vol. 2017/293). s.l.: IACR. Chapter 4.6.2.)

Signed-off-by: Vitaly Chikunov <vt@altlinux.org>
Cc: kernel-hardening@lists.openwall.com
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: x86/chacha20 - Add a 4-block AVX2 variant

This variant builds upon the idea of the 2-block AVX2 variant that
shuffles words after each round. The shuffling has a rather high latency,
so the arithmetic units are not optimally used.

Given that we have plenty of registers in AVX, this version parallelizes
the 2-block variant to do four blocks. While the first two blocks are
shuffling, the CPU can do the XORing on the second two blocks and
vice-versa, which makes this version much faster than the SSSE3 variant
for four blocks. The latter is now mostly for systems that do not have
AVX2, but there it is the work-horse, so we keep it in place.

The partial XORing function trailer is very similar to the AVX2 2-block
variant. While it could be shared, that code segment is rather short;
profiling is also easier with the trailer integrated, so we keep it per
function.

Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: x86/chacha20 - Add a 2-block AVX2 variant

This variant uses the same principle as the single block SSSE3 variant
by shuffling the state matrix after each round. With the wider AVX
registers, we can do two blocks in parallel, though.

This function can increase performance and efficiency significantly for
lengths that would otherwise require a 4-block function.

Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: x86/chacha20 - Use larger block functions more aggressively

Now that all block functions support partial lengths, engage the wider
block sizes more aggressively. This prevents using smaller block
functions multiple times, where the next larger block function would
have been faster.

Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: x86/chacha20 - Support partial lengths in 8-block AVX2 variant

Add a length argument to the eight block function for AVX2, so the
block function may XOR only a partial length of eight blocks.

To avoid unnecessary operations, we integrate XORing of the first four
blocks in the final lane interleaving; this also avoids some work in
the partial lengths path.

Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: x86/chacha20 - Support partial lengths in 4-block SSSE3 variant

Add a length argument to the quad block function for SSSE3, so the
block function may XOR only a partial length of four blocks.

As we already have the stack set up, the partial XORing does not need
to. This gives a slightly different function trailer, so we keep that
separate from the 1-block function.

Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: x86/chacha20 - Support partial lengths in 1-block SSSE3 variant

Add a length argument to the single block function for SSSE3, so the
block function may XOR only a partial length of the full block. Given
that the setup code is rather cheap, the function does not process more
than one block; this allows us to keep the block function selection in
the C glue code.

The required branching does not negatively affect performance for full
block sizes. The partial XORing uses simple "rep movsb" to copy the
data before and after doing XOR in SSE. This is rather efficient on
modern processors; movsw can be slightly faster, but the additional
complexity is probably not worth it.

Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

hwrng: bcm2835 - Switch to SPDX identifier

Adopt the SPDX license identifier headers to ease license compliance
management. While we are at this fix the comment style, too.

Cc: Lubomir Rintel <lkundrak@v3.sk>
Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Lubomir Rintel <lkundrak@v3.sk>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

padata: clean an indentation issue, remove extraneous space

Trivial fix to clean up an indentation issue

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: caam/qi2 - add support for Chacha20 + Poly1305

Add support for Chacha20 + Poly1305 combined AEAD:
-generic (rfc7539)
-IPsec (rfc7634 - known as rfc7539esp in the kernel)

Signed-off-by: Horia Geantă <horia.geanta@nxp.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: caam/jr - add support for Chacha20 + Poly1305

Add support for Chacha20 + Poly1305 combined AEAD:
-generic (rfc7539)
-IPsec (rfc7634 - known as rfc7539esp in the kernel)

Signed-off-by: Cristian Stoica <cristian.stoica@nxp.com>
Signed-off-by: Horia Geantă <horia.geanta@nxp.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: chacha20poly1305 - export CHACHAPOLY_IV_SIZE

Move CHACHAPOLY_IV_SIZE to header file, so it can be reused.

Signed-off-by: Cristian Stoica <cristian.stoica@nxp.com>
Signed-off-by: Horia Geantă <horia.geanta@nxp.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: caam/qi2 - add support for ChaCha20

Add support for ChaCha20 skcipher algorithm.

Signed-off-by: Carmen Iorga <carmen.iorga@nxp.com>
Signed-off-by: Horia Geantă <horia.geanta@nxp.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: caam - add register map changes cf. Era 10

Era 10 changes the register map.

The updates that affect the drivers:
-new version registers are added
-DBG_DBG[deco_state] field is moved to a new register -
DBG_EXEC[19:16] @ 8_0E3Ch.

Signed-off-by: Horia Geantă <horia.geanta@nxp.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: mxs-dcp - Add support for dcp clk

On 6ull and 6sll the DCP block has a clock which needs to be explicitly
enabled.

Add minimal handling for this at probe/remove time.

Signed-off-by: Leonard Crestez <leonard.crestez@nxp.com>
Reviewed-by: Fabio Estevam <festevam@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

dt-bindings: crypto: Mention clocks for mxs-dcp

Explicit clock enabling is required on 6sll and 6ull so mention that
standard clock bindings are used.

Signed-off-by: Leonard Crestez <leonard.crestez@nxp.com>
Reviewed-by: Fabio Estevam <festevam@gmail.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: streebog - add Streebog test vectors

Add testmgr and tcrypt tests and vectors for Streebog hash function
from RFC 6986 and GOST R 34.11-2012, for HMAC-Streebog vectors are
from RFC 7836 and R 50.1.113-2016.

Cc: linux-integrity@vger.kernel.org
Signed-off-by: Vitaly Chikunov <vt@altlinux.org>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: streebog - register Streebog in hash info for IMA

Register Streebog hash function in Hash Info arrays to let IMA use
it for its purposes.

Cc: linux-integrity@vger.kernel.org
Signed-off-by: Vitaly Chikunov <vt@altlinux.org>
Reviewed-by: Mimi Zohar <zohar@linux.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: streebog - add Streebog hash function

Add GOST/IETF Streebog hash function (GOST R 34.11-2012, RFC 6986)
generic hash transformation.

Cc: linux-integrity@vger.kernel.org
Signed-off-by: Vitaly Chikunov <vt@altlinux.org>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: aes-ce - Remove duplicate header

Remove asm/hwcap.h which is included more than once

Signed-off-by: Brajeswar Ghosh <brajeswar.linux@gmail.com>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: bcm - fix normal/non key hash algorithm failure

Remove setkey() callback handler for normal/non key
hash algorithms and keep it for AES-CBC/CMAC which needs key.

Fixes: 9d12ba86f818 ("crypto: brcm - Add Broadcom SPU driver")
Signed-off-by: Raveendra Padasalagi <raveendra.padasalagi@broadcom.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: cts - document NIST standard status

cts(cbc(aes)) as used in the kernel has been added to NIST
standard as CBC-CS3. Document it as such.

Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
Suggested-by: Stephan Mueller <smueller@chronox.de>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: ecc - check for invalid values in the key verification test

Currently used scalar multiplication algorithm (Matthieu Rivain, 2011)
have invalid values for scalar == 1, n-1, and for regularized version
n-2, which was previously not checked. Verify that they are not used as
private keys.

Signed-off-by: Vitaly Chikunov <vt@altlinux.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: testmgr - mark cts(cbc(aes)) as FIPS allowed

As per Sp800-38A addendum from Oct 2010[1], cts(cbc(aes)) is
allowed as a FIPS mode algorithm. Mark it as such.

[1] https://csrc.nist.gov/publications/detail/sp/800-38a/addendum/final

Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
Reviewed-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: user - clean up report structure copying

There have been a pretty ridiculous number of issues with initializing
the report structures that are copied to userspace by NETLINK_CRYPTO.
Commit 4473710df1f8 ("crypto: user - Prepare for CRYPTO_MAX_ALG_NAME
expansion") replaced some strncpy()s with strlcpy()s, thereby
introducing information leaks.  Later two other people tried to replace
other strncpy()s with strlcpy() too, which would have introduced even
more information leaks:

    - https://lore.kernel.org/patchwork/patch/954991/
    - https://patchwork.kernel.org/patch/10434351/

Commit cac5818c25d0 ("crypto: user - Implement a generic crypto
statistics") also uses the buggy strlcpy() approach and therefore leaks
uninitialized memory to userspace.  A fix was proposed, but it was
originally incomplete.

Seeing as how apparently no one can get this right with the current
approach, change all the reporting functions to:

- Start by memsetting the report structure to 0.  This guarantees it's
  always initialized, regardless of what happens later.
- Initialize all strings using strscpy().  This is safe after the
  memset, ensures null termination of long strings, avoids unnecessary
  work, and avoids the -Wstringop-truncation warnings from gcc.
- Use sizeof(var) instead of sizeof(type).  This is more robust against
  copy+paste errors.

For simplicity, also reuse the -EMSGSIZE return value from nla_put().

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: user - remove redundant reporting functions

The acomp, akcipher, and kpp algorithm types already have .report
methods defined, so there's no need to duplicate this functionality in
crypto_user itself; the duplicate functions are actually never executed.
Remove the unused code.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: chelsio - clean up various indentation issues

Trivial fix to clean up varous indentation issue

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

pcrypt: use format specifier in kobject_add

Passing string 'name' as the format specifier is potentially hazardous
because name could (although very unlikely to) have a format specifier
embedded in it causing issues when parsing the non-existent arguments
to these. Follow best practice by using the "%s" format string for
the string 'name'.

Cleans up clang warning:
crypto/pcrypt.c:397:40: warning: format string is not a string literal
(potentially insecure) [-Wformat-security]

Fixes: a3fb1e330dd2 ("pcrypt: Added sysfs interface to pcrypt")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: testmgr - add AES-CFB tests

Add AES128/192/256-CFB testvectors from NIST SP800-38A.

Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: cfb - fix decryption

crypto_cfb_decrypt_segment() incorrectly XOR'ed generated keystream with
IV, rather than with data stream, resulting in incorrect decryption.
Test vectors will be added in the next patch.

Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: cavium/nitrox - fix a DMA pool free failure

In crypto_alloc_context(), a DMA pool is allocated through dma_pool_alloc()
to hold the crypto context. The meta data of the DMA pool, including the
pool used for the allocation 'ndev->ctx_pool' and the base address of the
DMA pool used by the device 'dma', are then stored to the beginning of the
pool. These meta data are eventually used in crypto_free_context() to free
the DMA pool through dma_pool_free(). However, given that the DMA pool can
also be accessed by the device, a malicious device can modify these meta
data, especially when the device is controlled to deploy an attack. This
can cause an unexpected DMA pool free failure.

To avoid the above issue, this patch introduces a new structure
crypto_ctx_hdr and a new field chdr in the structure nitrox_crypto_ctx hold
the meta data information of the DMA pool after the allocation. Note that
the original structure ctx_hdr is not changed to ensure the compatibility.

Cc: <stable@vger.kernel.org>
Signed-off-by: Wenwen Wang <wang6495@umn.edu>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: ccree - add SM3 support

Add support for SM3 cipher in CryptoCell 713.

Signed-off-by: Yael Chemla <yael.chemla@foss.arm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: ccree - modify set_cipher_mode usage from cc_hash

encapsulate set_cipher_mode call with another api,
preparation for specific hash behavior as needed in later patches
when SM3 introduced.

Signed-off-by: Yael Chemla <yael.chemla@foss.arm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: ccree - adjust hash length to suit certain context specifics

Adjust hash length such that it will not be fixed and general for all algs.
Instead make it suitable for certain context information.
This is preparation for SM3 support.

Signed-off-by: Yael Chemla <yael.chemla@foss.arm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: ccree - add SM4 support

Add support for SM4 cipher in CryptoCell 713.

Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

dt-bindings: crypto: ccree: add ccree 713

Add device tree bindings associating Arm TrustZone CryptoCell 713 with the
ccree driver.

Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: ccree - add support for CryptoCell 713

Add support for Arm TrustZone CryptoCell 713.
Note that this patch just enables using a 713 in backwards compatible mode
to 712. Newer 713 specific features will follow.

Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: arm/aes - add some hardening against cache-timing attacks

Make the ARM scalar AES implementation closer to constant-time by
disabling interrupts and prefetching the tables into L1 cache.  This is
feasible because due to ARM's "free" rotations, the main tables are only
1024 bytes instead of the usual 4096 used by most AES implementations.

On ARM Cortex-A7, the speed loss is only about 5%.  The resulting code
is still over twice as fast as aes_ti.c.  Responsiveness is potentially
a concern, but interrupts are only disabled for a single AES block.

Note that even after these changes, the implementation still isn't
necessarily guaranteed to be constant-time; see
https://cr.yp.to/antiforgery/cachetiming-20050414.pdf for a discussion
of the many difficulties involved in writing truly constant-time AES
software.  But it's valuable to make such attacks more difficult.

Much of this patch is based on patches suggested by Ard Biesheuvel.

Suggested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: aes_ti - disable interrupts while accessing S-box

In the "aes-fixed-time" AES implementation, disable interrupts while
accessing the S-box, in order to make cache-timing attacks more
difficult.  Previously it was possible for the CPU to be interrupted
while the S-box was loaded into L1 cache, potentially evicting the
cachelines and causing later table lookups to be time-variant.

In tests I did on x86 and ARM, this doesn't affect performance
significantly.  Responsiveness is potentially a concern, but interrupts
are only disabled for a single AES block.

Note that even after this change, the implementation still isn't
necessarily guaranteed to be constant-time; see
https://cr.yp.to/antiforgery/cachetiming-20050414.pdf for a discussion
of the many difficulties involved in writing truly constant-time AES
software.  But it's valuable to make such attacks more difficult.

Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: user - Zeroize whole structure given to user space

For preventing uninitialized data to be given to user-space (and so leak
potential useful data), the crypto_stat structure must be correctly
initialized.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: cac5818c25d0 ("crypto: user - Implement a generic crypto statistics")
Signed-off-by: Corentin Labbe <clabbe@baylibre.com>
[EB: also fix it in crypto_reportstat_one()]
[EB: use sizeof(var) rather than sizeof(type)]
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: user - fix leaking uninitialized memory to userspace

All bytes of the NETLINK_CRYPTO report structures must be initialized,
since they are copied to userspace. The change from strncpy() to
strlcpy() broke this. As a minimal fix, change it back.

Fixes: 4473710df1f8 ("crypto: user - Prepare for CRYPTO_MAX_ALG_NAME expansion")
Cc: <stable@vger.kernel.org> # v4.12+
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: simd - correctly take reqsize of wrapped skcipher into account

The simd wrapper's skcipher request context structure consists
of a single subrequest whose size is taken from the subordinate
skcipher. However, in simd_skcipher_init(), the reqsize that is
retrieved is not from the subordinate skcipher but from the
cryptd request structure, whose size is completely unrelated to
the actual wrapped skcipher.

Reported-by: Qian Cai <cai@gmx.us>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: Qian Cai <cai@gmx.us>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: hisilicon - Fix reference after free of memories on error path

coccicheck currently warns of the following issues in the driver:
drivers/crypto/hisilicon/sec/sec_algs.c:864:51-66: ERROR: reference preceded by free on line 812
drivers/crypto/hisilicon/sec/sec_algs.c:864:40-49: ERROR: reference preceded by free on line 813
drivers/crypto/hisilicon/sec/sec_algs.c:861:8-24: ERROR: reference preceded by free on line 814
drivers/crypto/hisilicon/sec/sec_algs.c:860:41-51: ERROR: reference preceded by free on line 815
drivers/crypto/hisilicon/sec/sec_algs.c:867:7-18: ERROR: reference preceded by free on line 816

It would appear than on certain error paths that we may attempt reference-
after-free some memories.

This patch fixes those issues. The solution doesn't look perfect, but
having same memories free'd possibly from separate functions makes it
tricky.

Fixes: 915e4e8413da ("crypto: hisilicon - SEC security accelerator driver")
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: hisilicon - Fix NULL dereference for same dst and src

When the source and destination addresses for the cipher are the same, we
will get a NULL dereference from accessing the split destination
scatterlist memories, as shown:

[   56.565719] tcrypt:
[   56.565719] testing speed of async ecb(aes) (hisi_sec_aes_ecb) encryption
[   56.574683] tcrypt: test 0 (128 bit key, 16 byte blocks):
[   56.587585] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
[   56.596361] Mem abort info:
[   56.599151]   ESR = 0x96000006
[   56.602196]   Exception class = DABT (current EL), IL = 32 bits
[   56.608105]   SET = 0, FnV = 0
[   56.611149]   EA = 0, S1PTW = 0
[   56.614280] Data abort info:
[   56.617151]   ISV = 0, ISS = 0x00000006
[   56.620976]   CM = 0, WnR = 0
[   56.623930] user pgtable: 4k pages, 48-bit VAs, pgdp = (____ptrval____)
[   56.630533] [0000000000000000] pgd=0000041fc7e4d003, pud=0000041fcd9bf003, pmd=0000000000000000
[   56.639224] Internal error: Oops: 96000006 [#1] PREEMPT SMP
[   56.644782] Modules linked in: tcrypt(+)
[   56.648695] CPU: 21 PID: 2326 Comm: insmod Tainted: G        W         4.19.0-rc6-00001-g3fabfb8-dirty #716
[   56.658420] Hardware name: Huawei Taishan 2280 /D05, BIOS Hisilicon D05 IT17 Nemo 2.0 RC0 10/05/2018
[   56.667537] pstate: 20000005 (nzCv daif -PAN -UAO)
[   56.672322] pc : sec_alg_skcipher_crypto+0x318/0x748
[   56.677274] lr : sec_alg_skcipher_crypto+0x178/0x748
[   56.682224] sp : ffff0000118e3840
[   56.685525] x29: ffff0000118e3840 x28: ffff841fbb3f8118
[   56.690825] x27: 0000000000000000 x26: 0000000000000000
[   56.696125] x25: ffff841fbb3f8080 x24: ffff841fbadc0018
[   56.701425] x23: ffff000009119000 x22: ffff841fbb24e280
[   56.706724] x21: ffff841ff212e780 x20: ffff841ff212e700
[   56.712023] x19: 0000000000000001 x18: ffffffffffffffff
[   56.717322] x17: 0000000000000000 x16: 0000000000000000
[   56.722621] x15: ffff0000091196c8 x14: 72635f7265687069
[   56.727920] x13: 636b735f676c615f x12: ffff000009119940
[   56.733219] x11: 0000000000000000 x10: 00000000006080c0
[   56.738519] x9 : 0000000000000000 x8 : ffff841fbb24e480
[   56.743818] x7 : ffff841fbb24e500 x6 : ffff841ff00cdcc0
[   56.749117] x5 : 0000000000000010 x4 : 0000000000000000
[   56.754416] x3 : ffff841fbb24e380 x2 : ffff841fbb24e480
[   56.759715] x1 : 0000000000000000 x0 : ffff000008f682c8
[   56.765016] Process insmod (pid: 2326, stack limit = 0x(____ptrval____))
[   56.771702] Call trace:
[   56.774136]  sec_alg_skcipher_crypto+0x318/0x748
[   56.778740]  sec_alg_skcipher_encrypt+0x10/0x18
[   56.783259]  test_skcipher_speed+0x2a0/0x700 [tcrypt]
[   56.788298]  do_test+0x18f8/0x48c8 [tcrypt]
[   56.792469]  tcrypt_mod_init+0x60/0x1000 [tcrypt]
[   56.797161]  do_one_initcall+0x5c/0x178
[   56.800985]  do_init_module+0x58/0x1b4
[   56.804721]  load_module+0x1da4/0x2150
[   56.808456]  __se_sys_init_module+0x14c/0x1e8
[   56.812799]  __arm64_sys_init_module+0x18/0x20
[   56.817231]  el0_svc_common+0x60/0xe8
[   56.820880]  el0_svc_handler+0x2c/0x80
[   56.824615]  el0_svc+0x8/0xc
[   56.827483] Code: a94c87a3 910b2000 f87b7842 f9004ba2 (b87b7821)
[   56.833564] ---[ end trace 0f63290590e93d94 ]---
Segmentation fault

Fix this by only accessing these memories when we have different src and
dst.

Fixes: 915e4e8413da ("crypto: hisilicon - SEC security accelerator driver")
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Linux 4.20-rc1

Merge tag 'tags/upstream-4.20-rc1' of git://git.infradead.org/linux-ubifs

Pull UBIFS updates from Richard Weinberger:

- Full filesystem authentication feature, UBIFS is now able to have the
   whole filesystem structure authenticated plus user data encrypted and
   authenticated.

- Minor cleanups

* tag 'tags/upstream-4.20-rc1' of git://git.infradead.org/linux-ubifs: (26 commits)
  ubifs: Remove unneeded semicolon
  Documentation: ubifs: Add authentication whitepaper
  ubifs: Enable authentication support
  ubifs: Do not update inode size in-place in authenticated mode
  ubifs: Add hashes and HMACs to default filesystem
  ubifs: authentication: Authenticate super block node
  ubifs: Create hash for default LPT
  ubfis: authentication: Authenticate master node
  ubifs: authentication: Authenticate LPT
  ubifs: Authenticate replayed journal
  ubifs: Add auth nodes to garbage collector journal head
  ubifs: Add authentication nodes to journal
  ubifs: authentication: Add hashes to index nodes
  ubifs: Add hashes to the tree node cache
  ubifs: Create functions to embed a HMAC in a node
  ubifs: Add helper functions for authentication support
  ubifs: Add separate functions to init/crc a node
  ubifs: Format changes for authentication support
  ubifs: Store read superblock node
  ubifs: Drop write_node
  ...

Merge tag 'nfs-for-4.20-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client bugfixes from Trond Myklebust:
"Highlights include:

  Bugfix:
   - Fix build issues on architectures that don't provide 64-bit cmpxchg

  Cleanups:
   - Fix a spelling mistake"

* tag 'nfs-for-4.20-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFS: fix spelling mistake, EACCESS -> EACCES
  SUNRPC: Use atomic(64)_t for seq_send(64)

Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull more timer updates from Thomas Gleixner:
"A set of commits for the new C-SKY architecture timers"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  dt-bindings: timer: gx6605s SOC timer
  clocksource/drivers/c-sky: Add gx6605s SOC system timer
  dt-bindings: timer: C-SKY Multi-processor timer
  clocksource/drivers/c-sky: Add C-SKY SMP timer

Merge tag 'ntb-4.20' of git://github.com/jonmason/ntb

Pull NTB updates from Jon Mason:
"Fairly minor changes and bug fixes:

  NTB IDT thermal changes and hook into hwmon, ntb_netdev clean-up of
  private struct, and a few bug fixes"

* tag 'ntb-4.20' of git://github.com/jonmason/ntb:
  ntb: idt: Alter the driver info comments
  ntb: idt: Discard temperature sensor IRQ handler
  ntb: idt: Add basic hwmon sysfs interface
  ntb: idt: Alter temperature read method
  ntb_netdev: Simplify remove with client device drvdata
  NTB: transport: Try harder to alloc an aligned MW buffer
  ntb: ntb_transport: Mark expected switch fall-throughs
  ntb: idt: Set PCIe bus address to BARLIMITx
  NTB: ntb_hw_idt: replace IS_ERR_OR_NULL with regular NULL checks
  ntb: intel: fix return value for ndev_vec_mask()
  ntb_netdev: fix sleep time mismatch

Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler fixes from Ingo Molnar:
"A memory (under-)allocation fix and a comment fix"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/topology: Fix off by one bug
sched/rt: Update comment in pick_next_task_rt()

Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Ingo Molnar:
"A number of fixes and some late updates:

   - make in_compat_syscall() behavior on x86-32 similar to other
     platforms, this touches a number of generic files but is not
     intended to impact non-x86 platforms.

   - objtool fixes

   - PAT preemption fix

   - paravirt fixes/cleanups

   - cpufeatures updates for new instructions

   - earlyprintk quirk

   - make microcode version in sysfs world-readable (it is already
     world-readable in procfs)

   - minor cleanups and fixes"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  compat: Cleanup in_compat_syscall() callers
  x86/compat: Adjust in_compat_syscall() to generic code under !COMPAT
  objtool: Support GCC 9 cold subfunction naming scheme
  x86/numa_emulation: Fix uniform-split numa emulation
  x86/paravirt: Remove unused _paravirt_ident_32
  x86/mm/pat: Disable preemption around __flush_tlb_all()
  x86/paravirt: Remove GPL from pv_ops export
  x86/traps: Use format string with panic() call
  x86: Clean up 'sizeof x' => 'sizeof(x)'
  x86/cpufeatures: Enumerate MOVDIR64B instruction
  x86/cpufeatures: Enumerate MOVDIRI instruction
  x86/earlyprintk: Add a force option for pciserial device
  objtool: Support per-function rodata sections
  x86/microcode: Make revision and processor flags world-readable

Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf updates and fixes from Ingo Molnar:
"These are almost all tooling updates: 'perf top', 'perf trace' and
  'perf script' fixes and updates, an UAPI header sync with the merge
  window versions, license marker updates, much improved Sparc support
  from David Miller, and a number of fixes"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (66 commits)
  perf intel-pt/bts: Calculate cpumode for synthesized samples
  perf intel-pt: Insert callchain context into synthesized callchains
  perf tools: Don't clone maps from parent when synthesizing forks
  perf top: Start display thread earlier
  tools headers uapi: Update linux/if_link.h header copy
  tools headers uapi: Update linux/netlink.h header copy
  tools headers: Sync the various kvm.h header copies
  tools include uapi: Update linux/mmap.h copy
  perf trace beauty: Use the mmap flags table generated from headers
  perf beauty: Wire up the mmap flags table generator to the Makefile
  perf beauty: Add a generator for MAP_ mmap's flag constants
  tools include uapi: Update asound.h copy
  tools arch uapi: Update asm-generic/unistd.h and arm64 unistd.h copies
  tools include uapi: Update linux/fs.h copy
  perf callchain: Honour the ordering of PERF_CONTEXT_{USER,KERNEL,etc}
  perf cs-etm: Correct CPU mode for samples
  perf unwind: Take pgoff into account when reporting elf to libdwfl
  perf top: Do not use overwrite mode by default
  perf top: Allow disabling the overwrite mode
  perf trace: Beautify mount's first pathname arg
  ...

Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq fixes from Ingo Molnar:
"An irqchip driver fix and a memory (over-)allocation fix"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/irq-mvebu-sei: Fix a NULL vs IS_ERR() bug in probe function
irq/matrix: Fix memory overallocation

sched/topology: Fix off by one bug

With the addition of the NUMA identity level, we increased @level by
one and will run off the end of the array in the distance sort loop.

Fixed: 051f3ca02e46 ("sched/topology: Introduce NUMA identity node sched domain")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>

Merge branch 'core/urgent' into x86/urgent, to pick up objtool fix

Signed-off-by: Ingo Molnar <mingo@kernel.org>

Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

Pull ARM SoC fixes from Olof Johansson:
"A few fixes who have come in near or during the merge window:

   - Removal of a VLA usage in Marvell mpp platform code

   - Enable some IPMI options for ARM64 servers by default, helps
     testing

   - Enable PREEMPT on 32-bit ARMv7 defconfig

   - Minor fix for stm32 DT (removal of an unused DMA property)

   - Bugfix for TI OMAP1-based ams-delta (-EINVAL -> IRQ_NOTCONNECTED)"

* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
  ARM: dts: stm32: update HASH1 dmas property on stm32mp157c
  ARM: orion: avoid VLA in orion_mpp_conf
  ARM: defconfig: Update multi_v7 to use PREEMPT
  arm64: defconfig: Enable some IPMI configs
  soc: ti: QMSS: Fix usage of irq_set_affinity_hint
  ARM: OMAP1: ams-delta: Fix impossible .irq < 0

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull more arm64 updates from Catalin Marinas:

- fix W+X page (mark RO) allocated by the arm64 kprobes code

- Makefile fix for .i files in out of tree modules

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: kprobe: make page to RO mode when allocate it
  arm64: kdump: fix small typo
  arm64: makefile fix build of .i file in external module case

Merge tag 'dma-mapping-4.20-2' of git://git.infradead.org/users/hch/dma-mapping

Pull dma-mapping fix from Christoph Hellwig:
"Avoid compile warnings on non-default arm64 configs"

* tag 'dma-mapping-4.20-2' of git://git.infradead.org/users/hch/dma-mapping:
arm64: fix warnings without CONFIG_IOMMU_DMA

Merge tag 'kbuild-v4.20-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild updates from Masahiro Yamada:

- clean-up leftovers in Kconfig files

- remove stale oldnoconfig and silentoldconfig targets

- remove unneeded cc-fullversion and cc-name variables

- improve merge_config script to allow overriding option prefix

* tag 'kbuild-v4.20-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  kbuild: remove cc-name variable
  kbuild: replace cc-name test with CONFIG_CC_IS_CLANG
  merge_config.sh: Allow to define config prefix
  kbuild: remove unused cc-fullversion variable
  kconfig: remove silentoldconfig target
  kconfig: remove oldnoconfig target
  powerpc: PCI_MSI needs PCI
  powerpc: remove CONFIG_MCA leftovers
  powerpc: remove CONFIG_PCI_QSPAN
  scsi: aha152x: rename the PCMCIA define

Merge tag '4.20-rc1-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull cifs fixes and updates from Steve French:
"Three small fixes (one Kerberos related, one for stable, and another
  fixes an oops in xfstest 377), two helpful debugging improvements,
  three patches for cifs directio and some minor cleanup"

* tag '4.20-rc1-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: fix signed/unsigned mismatch on aio_read patch
  cifs: don't dereference smb_file_target before null check
  CIFS: Add direct I/O functions to file_operations
  CIFS: Add support for direct I/O write
  CIFS: Add support for direct I/O read
  smb3: missing defines and structs for reparse point handling
  smb3: allow more detailed protocol info on open files for debugging
  smb3: on kerberos mount if server doesn't specify auth type use krb5
  smb3: add trace point for tree connection
  cifs: fix spelling mistake, EACCESS -> EACCES
  cifs: fix return value for cifs_listxattr

Merge branch 'work.afs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull 9p fix from Al Viro:
"Regression fix for net/9p handling of iov_iter; broken by braino when
switching to iov_iter_is_kvec() et.al., spotted and fixed by Marc"

* 'work.afs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
iov_iter: Fix 9p virtio breakage

Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull more SCSI updates from James Bottomley:
"This is a set of minor small (and safe changes) that didn't make the
  initial pull request plus some bug fixes"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: mvsas: Remove set but not used variable 'id'
  scsi: qla2xxx: Remove two arguments from qlafx00_error_entry()
  scsi: qla2xxx: Make sure that qlafx00_ioctl_iosb_entry() initializes 'res'
  scsi: qla2xxx: Remove a set-but-not-used variable
  scsi: qla2xxx: Make qla2x00_sysfs_write_nvram() easier to analyze
  scsi: qla2xxx: Declare local functions 'static'
  scsi: qla2xxx: Improve several kernel-doc headers
  scsi: qla2xxx: Modify fall-through annotations
  scsi: 3w-sas: 3w-9xxx: Use unsigned char for cdb
  scsi: mvsas: Use dma_pool_zalloc
  scsi: target: Don't request modules that aren't even built
  scsi: target: Set response length for REPORT TARGET PORT GROUPS