Bin Liu [Wed, 22 Jun 2016 13:23:37 +0000 (16:23 +0300)]
crypto: omap-sham - set sw fallback to 240 bytes
Adds software fallback support for small crypto requests. In these cases,
it is undesirable to use DMA, as setting it up itself is rather heavy
operation. Gives about 40% extra performance in ipsec usecase.
Signed-off-by: Bin Liu <b-liu@ti.com>
[t-kristo@ti.com: dropped the extra traces, updated some comments
on the code] Signed-off-by: Tero Kristo <t-kristo@ti.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Tero Kristo [Wed, 22 Jun 2016 13:23:35 +0000 (16:23 +0300)]
crypto: omap-sham - change queue size from 1 to 10
Change crypto queue size from 1 to 10 for omap SHA driver. This should
allow clients to enqueue requests more effectively to avoid serializing
whole crypto sequences, giving extra performance.
Signed-off-by: Tero Kristo <t-kristo@ti.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Tero Kristo [Wed, 22 Jun 2016 13:23:34 +0000 (16:23 +0300)]
crypto: omap-sham - use runtime_pm autosuspend for clock handling
Calling runtime PM API for every block causes serious performance hit to
crypto operations that are done on a long buffer. As crypto is performed
on a page boundary, encrypting large buffers can cause a series of crypto
operations divided by page. The runtime PM API is also called those many
times.
Convert the driver to use runtime_pm autosuspend instead, with a default
timeout value of 1 second. This results in upto ~50% speedup.
Signed-off-by: Tero Kristo <t-kristo@ti.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
crypto: kpp - Key-agreement Protocol Primitives API (KPP)
Add key-agreement protocol primitives (kpp) API which allows to
implement primitives required by protocols such as DH and ECDH.
The API is composed mainly by the following functions
* set_secret() - It allows the user to set his secret, also
referred to as his private key, along with the parameters
known to both parties involved in the key-agreement session.
* generate_public_key() - It generates the public key to be sent to
the other counterpart involved in the key-agreement session. The
function has to be called after set_params() and set_secret()
* generate_secret() - It generates the shared secret for the session
Other functions such as init() and exit() are provided for allowing
cryptographic hardware to be inizialized properly before use
Signed-off-by: Salvatore Benedetto <salvatore.benedetto@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Megha Dey [Wed, 22 Jun 2016 01:21:46 +0000 (18:21 -0700)]
crypto: sha1-mb - async implementation for sha1-mb
Herbert wants the sha1-mb algorithm to have an async implementation:
https://lkml.org/lkml/2016/4/5/286.
Currently, sha1-mb uses an async interface for the outer algorithm
and a sync interface for the inner algorithm. This patch introduces
a async interface for even the inner algorithm.
Signed-off-by: Megha Dey <megha.dey@linux.intel.com> Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Herbert Xu [Tue, 21 Jun 2016 08:55:17 +0000 (16:55 +0800)]
crypto: ghash-ce - Fix cryptd reordering
This patch fixes an old bug where requests can be reordered because
some are processed by cryptd while others are processed directly
in softirq context.
The fix is to always postpone to cryptd if there are currently
requests outstanding from the same tfm.
This patch also removes the redundant use of cryptd in the async
init function as init never touches the FPU.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Herbert Xu [Tue, 21 Jun 2016 08:55:16 +0000 (16:55 +0800)]
crypto: ghash-clmulni - Fix cryptd reordering
This patch fixes an old bug where requests can be reordered because
some are processed by cryptd while others are processed directly
in softirq context.
The fix is to always postpone to cryptd if there are currently
requests outstanding from the same tfm.
This patch also removes the redundant use of cryptd in the async
init function as init never touches the FPU.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Herbert Xu [Tue, 21 Jun 2016 08:55:15 +0000 (16:55 +0800)]
crypto: ablk_helper - Fix cryptd reordering
This patch fixes an old bug where requests can be reordered because
some are processed by cryptd while others are processed directly
in softirq context.
The fix is to always postpone to cryptd if there are currently
requests outstanding from the same tfm.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Herbert Xu [Tue, 21 Jun 2016 08:55:14 +0000 (16:55 +0800)]
crypto: aesni - Fix cryptd reordering problem on gcm
This patch fixes an old bug where gcm requests can be reordered
because some are processed by cryptd while others are processed
directly in softirq context.
The fix is to always postpone to cryptd if there are currently
requests outstanding from the same tfm.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Herbert Xu [Tue, 21 Jun 2016 08:55:13 +0000 (16:55 +0800)]
crypto: cryptd - Add helpers to check whether a tfm is queued
This patch adds helpers to check whether a given tfm is currently
queued. This is meant to be used by ablk_helper and similar
entities to ensure that no reordering is introduced because of
requests queued in cryptd with respect to requests being processed
in softirq context.
The per-cpu queue length limit is also increased to 1000 in line
with network limits.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Romain Perier [Tue, 21 Jun 2016 08:08:40 +0000 (10:08 +0200)]
crypto: marvell - Increase the size of the crypto queue
Now that crypto requests are chained together at the DMA level, we
increase the size of the crypto queue for each engine. The result is
that as the backlog list is reached later, it does not stop the crypto
stack from sending asychronous requests, so more cryptographic tasks
are processed by the engines.
Signed-off-by: Romain Perier <romain.perier@free-electrons.com> Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Romain Perier [Tue, 21 Jun 2016 08:08:39 +0000 (10:08 +0200)]
crypto: marvell - Add support for chaining crypto requests in TDMA mode
The Cryptographic Engines and Security Accelerators (CESA) supports the
Multi-Packet Chain Mode. With this mode enabled, multiple tdma requests
can be chained and processed by the hardware without software
intervention. This mode was already activated, however the crypto
requests were not chained together. By doing so, we reduce significantly
the number of IRQs. Instead of being interrupted at the end of each
crypto request, we are interrupted at the end of the last cryptographic
request processed by the engine.
This commits re-factorizes the code, changes the code architecture and
adds the required data structures to chain cryptographic requests
together before sending them to an engine (stopped or possibly already
running).
Signed-off-by: Romain Perier <romain.perier@free-electrons.com> Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Romain Perier [Tue, 21 Jun 2016 08:08:38 +0000 (10:08 +0200)]
crypto: marvell - Add load balancing between engines
This commits adds support for fine grained load balancing on
multi-engine IPs. The engine is pre-selected based on its current load
and on the weight of the crypto request that is about to be processed.
The global crypto queue is also moved to each engine. These changes are
required to allow chaining crypto requests at the DMA level. By using
a crypto queue per engine, we make sure that we keep the state of the
tdma chain synchronized with the crypto queue. We also reduce contention
on 'cesa_dev->lock' and improve parallelism.
Signed-off-by: Romain Perier <romain.perier@free-electrons.com> Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Romain Perier [Tue, 21 Jun 2016 08:08:37 +0000 (10:08 +0200)]
crypto: marvell - Move SRAM I/O operations to step functions
Currently the crypto requests were sent to engines sequentially.
This commit moves the SRAM I/O operations from the prepare to the step
functions. It provides flexibility for future works and allow to prepare
a request while the engine is running.
Signed-off-by: Romain Perier <romain.perier@free-electrons.com> Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Romain Perier [Tue, 21 Jun 2016 08:08:36 +0000 (10:08 +0200)]
crypto: marvell - Add a complete operation for async requests
So far, the 'process' operation was used to check if the current request
was correctly handled by the engine, if it was the case it copied
information from the SRAM to the main memory. Now, we split this
operation. We keep the 'process' operation, which still checks if the
request was correctly handled by the engine or not, then we add a new
operation for completion. The 'complete' method copies the content of
the SRAM to memory. This will soon become useful if we want to call
the process and the complete operations from different locations
depending on the type of the request (different cleanup logic).
Signed-off-by: Romain Perier <romain.perier@free-electrons.com> Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Romain Perier [Tue, 21 Jun 2016 08:08:35 +0000 (10:08 +0200)]
crypto: marvell - Move tdma chain out of mv_cesa_tdma_req and remove it
Currently, the only way to access the tdma chain is to use the 'req'
union from a mv_cesa_{ablkcipher,ahash}. This will soon become a problem
if we want to handle the TDMA chaining vs standard/non-DMA processing in
a generic way (with generic functions at the cesa.c level detecting
whether the request should be queued at the DMA level or not). Hence the
decision to move the chain field a the mv_cesa_req level at the expense
of adding 2 void * fields to all request contexts (including non-DMA
ones) and to remove the type completly. To limit the overhead, we get
rid of the type field, which can now be deduced from the req->chain.first
value. Once these changes are done the union is no longer needed, so
remove it and move mv_cesa_ablkcipher_std_req and mv_cesa_req
to mv_cesa_ablkcipher_req directly. There are also no needs to keep the
'base' field into the union of mv_cesa_ahash_req, so move it into the
upper structure.
Signed-off-by: Romain Perier <romain.perier@free-electrons.com> Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Romain Perier [Tue, 21 Jun 2016 08:08:34 +0000 (10:08 +0200)]
crypto: marvell - Copy IV vectors by DMA transfers for acipher requests
Add a TDMA descriptor at the end of the request for copying the
output IV vector via a DMA transfer. This is a good way for offloading
as much as processing as possible to the DMA and the crypto engine.
This is also required for processing multiple cipher requests
in chained mode, otherwise the content of the IV vector would be
overwritten by the last processed request.
Signed-off-by: Romain Perier <romain.perier@free-electrons.com> Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Romain Perier [Tue, 21 Jun 2016 08:08:33 +0000 (10:08 +0200)]
crypto: marvell - Fix wrong type check in dma functions
So far, the way that the type of a TDMA operation was checked was wrong.
We have to use the type mask in order to get the right part of the flag
containing the type of the operation.
Signed-off-by: Romain Perier <romain.perier@free-electrons.com> Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Romain Perier [Tue, 21 Jun 2016 08:08:32 +0000 (10:08 +0200)]
crypto: marvell - Check engine is not already running when enabling a req
Add a BUG_ON() call when the driver tries to launch a crypto request
while the engine is still processing the previous one. This replaces
a silent system hang by a verbose kernel panic with the associated
backtrace to let the user know that something went wrong in the CESA
driver.
Signed-off-by: Romain Perier <romain.perier@free-electrons.com> Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Romain Perier [Tue, 21 Jun 2016 08:08:31 +0000 (10:08 +0200)]
crypto: marvell - Add a macro constant for the size of the crypto queue
Adding a macro constant to be used for the size of the crypto queue,
instead of using a numeric value directly. It will be easier to
maintain in case we add more than one crypto queue of the same size.
Signed-off-by: Romain Perier <romain.perier@free-electrons.com> Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Dan Carpenter [Fri, 17 Jun 2016 09:16:19 +0000 (12:16 +0300)]
crypto: drbg - fix an error code in drbg_init_sym_kernel()
We accidentally return PTR_ERR(NULL) which is success but we should
return -ENOMEM.
Fixes: 355912852115 ('crypto: drbg - use CTR AES instead of ECB AES') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Stephan Mueller <smueller@chronox.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Jeff Garzik [Fri, 17 Jun 2016 05:00:35 +0000 (10:30 +0530)]
crypto: sha3 - Add SHA-3 hash algorithm
This patch adds the implementation of SHA3 algorithm
in software and it's based on original implementation
pushed in patch https://lwn.net/Articles/518415/ with
additional changes to match the padding rules specified
in SHA-3 specification.
Signed-off-by: Jeff Garzik <jgarzik@redhat.com> Signed-off-by: Raveendra Padasalagi <raveendra.padasalagi@broadcom.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Arnd Bergmann [Thu, 16 Jun 2016 09:05:46 +0000 (11:05 +0200)]
crypto: caam - fix misspelled upper_32_bits
An endianess fix mistakenly used higher_32_bits() instead of
upper_32_bits(), and that doesn't exist:
drivers/crypto/caam/desc_constr.h: In function 'append_ptr':
drivers/crypto/caam/desc_constr.h:84:75: error: implicit declaration of function 'higher_32_bits' [-Werror=implicit-function-declaration]
*offset = cpu_to_caam_dma(ptr);
Stephan Mueller [Tue, 14 Jun 2016 05:35:37 +0000 (07:35 +0200)]
crypto: drbg - use full CTR AES for update
The CTR DRBG update function performs a full CTR AES operation including
the XOR with "plaintext" data. Hence, remove the XOR from the code and
use the CTR mode to do the XOR.
Signed-off-by: Stephan Mueller <smueller@chronox.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Stephan Mueller [Tue, 14 Jun 2016 05:34:13 +0000 (07:34 +0200)]
crypto: drbg - use CTR AES instead of ECB AES
The CTR DRBG derives its random data from the CTR that is encrypted with
AES.
This patch now changes the CTR DRBG implementation such that the
CTR AES mode is employed. This allows the use of steamlined CTR AES
implementation such as ctr-aes-aesni.
Unfortunately there are the following subtile changes we need to apply
when using the CTR AES mode:
- the CTR mode increments the counter after the cipher operation, but
the CTR DRBG requires the increment before the cipher op. Hence, the
crypto_inc is applied to the counter (drbg->V) once it is
recalculated.
- the CTR mode wants to encrypt data, but the CTR DRBG is interested in
the encrypted counter only. The full CTR mode is the XOR of the
encrypted counter with the plaintext data. To access the encrypted
counter, the patch uses a NULL data vector as plaintext to be
"encrypted".
Signed-off-by: Stephan Mueller <smueller@chronox.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The workqueue device_reset_wq has workitem &reset_data->reset_work per
adf_reset_dev_data. The workqueue pf2vf_resp_wq is a workqueue for
PF2VF responses has workitem &pf2vf_resp->pf2vf_resp_work per pf2vf_resp.
The workqueue adf_vf_stop_wq is used to call adf_dev_stop()
asynchronously.
Dedicated workqueues have been used in all cases since the workitems
on the workqueues are involved in operation of crypto which can be used in
the IO path which is depended upon during memory reclaim. Hence,
WQ_MEM_RECLAIM has been set to gurantee forward progress under memory
pressure.
Since there are only a fixed number of work items, explicit concurrency
limit is unnecessary.
SEC1 doesn't have IPSEC_ESP descriptor type but it is able to perform
IPSEC using HMAC_SNOOP_NO_AFEU, which is also existing on SEC2
In order to be able to define descriptors templates for SEC1 without
breaking SEC2+, we have to give lower priority to HMAC_SNOOP_NO_AFEU
so that SEC2+ selects IPSEC_ESP and not HMAC_SNOOP_NO_AFEU which is
less performant.
This is done by adding a priority field in the template. If the field
is 0, we use the default priority, otherwise we used the one in the
field.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
crypto: talitos - Implement AEAD for SEC1 using HMAC_SNOOP_NO_AFEU
This patchs enhances the IPSEC_ESP related functions for them to
also supports the same operations with descriptor type
HMAC_SNOOP_NO_AFEU.
The differences between the two descriptor types are:
* pointeurs 2 and 3 are swaped (Confidentiality key and
Primary EU Context IN)
* HMAC_SNOOP_NO_AFEU has CICV out in pointer 6
* HMAC_SNOOP_NO_AFEU has no primary EU context out so we get it
from the end of data out
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
crypto: talitos - making mapping helpers more generic
In preparation of IPSEC for SEC1, first step is to make the mapping
helpers more generic so that they can also be used by AEAD functions.
First, the functions are moved before IPSEC functions in talitos.c
talitos_sg_unmap() and unmap_sg_talitos_ptr() are merged as they
are quite similar, the second one handling the SEC1 case an calling
the first one for SEC2
map_sg_in_talitos_ptr() and map_sg_out_talitos_ptr() are merged
into talitos_sg_map() and enhenced to support offseted zones
as used for AEAD. The actual mapping is now performed outside that
helper. The DMA sync is also done outside to not make it several
times.
talitos_edesc_alloc() size calculation are fixed to also take into
account AEAD specific parts also for SEC1
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
crypto: talitos - using helpers for all talitos_ptr operations
Use helper for all modifications to talitos_ptr in preparation to
the implementation of AEAD for SEC1
to_talitos_ptr_extent_clear() has been removed in favor of
to_talitos_ptr_ext_set() to set any value and
to_talitos_ptr_ext_or() to or the extent field with a value
name has been shorten to help keeping single lines of 80 chars
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Bob Ham [Fri, 3 Jun 2016 11:13:08 +0000 (12:13 +0100)]
hwrng: chaoskey - Fix URB warning due to timeout on Alea
The first read on an Alea takes about 1.8 seconds, more than the
timeout value waiting for the read. As a consequence, later URB reuse
causes the warning given below. To avoid this, we increase the wait
time for the first read on the Alea.
Bob Ham [Fri, 3 Jun 2016 11:13:07 +0000 (12:13 +0100)]
hwrng: chaoskey - Add support for Araneus Alea I USB RNG
Adds support for the Araneus Alea I USB hardware Random Number
Generator which is interfaced with in exactly the same way as the
Altus Metrum ChaosKey. We just add the appropriate device ID and
modify the config help text.
Signed-off-by: Bob Ham <bob.ham@collabora.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Megha Dey [Tue, 31 May 2016 21:42:20 +0000 (14:42 -0700)]
crypto: sha1-mb - stylistic cleanup
Currently there are several checkpatch warnings in the sha1_mb.c file:
'WARNING: line over 80 characters' in the sha1_mb.c file. Also, the
syntax of some multi-line comments are not correct. This patch fixes
these issues.
Signed-off-by: Megha Dey <megha.dey@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
crypto: s5p-sss - Use consistent indentation for variables and members
Bring some consistency by:
1. Replacing fixed-space indentation of structure members with just
tabs.
2. Remove indentation in declaration of local variable between type and
name. Driver was mixing usage of such indentation and lack of it.
When removing indentation, reorder variables in
reversed-christmas-tree order with first variables being initialized
ones.
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Acked-by: Vladimir Zapolskiy <vz@mleia.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Nicolai Stange [Thu, 26 May 2016 21:19:55 +0000 (23:19 +0200)]
lib/mpi: refactor mpi_read_from_buffer() in terms of mpi_read_raw_data()
mpi_read_from_buffer() and mpi_read_raw_data() do basically the same thing
except that the former extracts the number of payload bits from the first
two bytes of the input buffer.
Besides that, the data copying logic is exactly the same.
Replace the open coded buffer to MPI instance conversion by a call to
mpi_read_raw_data().
Signed-off-by: Nicolai Stange <nicstange@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Nicolai Stange [Thu, 26 May 2016 21:19:54 +0000 (23:19 +0200)]
lib/mpi: mpi_read_from_buffer(): sanitize short buffer printk
The first two bytes of the input buffer encode its expected length and
mpi_read_from_buffer() prints a console message if the given buffer is too
short.
However, there are some oddities with how this message is printed:
- It is printed at the default loglevel. This is different from the
one used in the case that the first two bytes' value is unsupportedly
large, i.e. KERN_INFO.
- The format specifier '%d' is used for unsigned ints.
- It prints the values of nread and *ret_nread. This is redundant since
the former is always the latter + 1.
Clean this up as follows:
- Use pr_info() rather than printk() with no loglevel.
- Use the format specifiers '%u' in place if '%d'.
- Do not print the redundant 'nread' but the more helpful 'nbytes' value.
Signed-off-by: Nicolai Stange <nicstange@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Nicolai Stange [Thu, 26 May 2016 21:19:53 +0000 (23:19 +0200)]
lib/mpi: mpi_read_from_buffer(): return -EINVAL upon too short buffer
Currently, if the input buffer is shorter than the expected length as
indicated by its first two bytes, an MPI instance of this expected length
will be allocated and filled with as much data as is available. The rest
will remain uninitialized.
Instead of leaving this condition undetected, an error code should be
reported to the caller.
Since this situation indicates that the input buffer's first two bytes,
encoding the number of expected bits, are garbled, -EINVAL is appropriate
here.
If the input buffer is shorter than indicated by its first two bytes,
make mpi_read_from_buffer() return -EINVAL.
Get rid of the 'nread' variable: with the new semantics, the total number
of bytes read from the input buffer is known in advance.
Signed-off-by: Nicolai Stange <nicstange@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
mpi_read_from_buffer() reads a MPI from a buffer into a newly allocated
MPI instance. It expects the buffer's leading two bytes to contain the
number of bits, followed by the actual payload.
On failure, it returns NULL and updates the in/out argument ret_nread
somewhat inconsistently:
- If the given buffer is too short to contain the leading two bytes
encoding the number of bits or their value is unsupported, then
ret_nread will be cleared.
- If the allocation of the resulting MPI instance fails, ret_nread is left
as is.
The only user of mpi_read_from_buffer(), digsig_verify_rsa(), simply checks
for a return value of NULL and returns -ENOMEM if that happens.
While this is all of cosmetic nature only, there is another error condition
which currently isn't detectable by the caller of mpi_read_from_buffer():
if the given buffer is too small to hold the number of bits as encoded in
its first two bytes, the return value will be non-NULL and *ret_nread > 0.
In preparation of communicating this condition to the caller, let
mpi_read_from_buffer() return error values by means of the ERR_PTR()
mechanism.
Make the sole caller of mpi_read_from_buffer(), digsig_verify_rsa(),
check the return value for IS_ERR() rather than == NULL. If IS_ERR() is
true, return the associated error value rather than the fixed -ENOMEM.
Signed-off-by: Nicolai Stange <nicstange@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The number of bits, nbits, is calculated in mpi_read_raw_data() as follows:
nbits = nbytes * 8;
Afterwards, the number of leading zero bits of the first byte get
subtracted:
nbits -= count_leading_zeros(buffer[0]);
However, count_leading_zeros() takes an unsigned long and thus,
the u8 gets promoted to an unsigned long.
Thus, the above doesn't subtract the number of leading zeros in the most
significant nonzero input byte from nbits, but the number of leading
zeros of the most significant nonzero input byte promoted to unsigned long,
i.e. BITS_PER_LONG - 8 too many.
Fix this by subtracting
count_leading_zeros(...) - (BITS_PER_LONG - 8)
from nbits only.
Fixes: e1045992949 ("MPILIB: Provide a function to read raw data into an
MPI") Signed-off-by: Nicolai Stange <nicstange@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch fixes the following warning:
drivers/char/hw_random/stm32-rng.c: In function 'stm32_rng_read':
drivers/char/hw_random/stm32-rng.c:82:19: warning: 'sr' may be used
uninitialized in this function
Reported-by: Sudip Mukherjee <sudip.mukherjee@codethink.co.uk> Suggested-by: Arnd Bergmann <arnd@arndb.de> Cc: Daniel Thompson <daniel.thompson@linaro.org> Signed-off-by: Maxime Coquelin <mcoquelin.stm32@gmail.com> Reviewed-by: Daniel Thompson <daniel.thompson@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
MAINTAINERS: Add file patterns for crypto device tree bindings
Submitters of device tree binding documentation may forget to CC
the subsystem maintainer if this is missing.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: David S. Miller <davem@davemloft.net> Cc: linux-crypto@vger.kernel.org Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
There are SoCs like LS1043A where CAAM endianness (BE) does not match
the default endianness of the core (LE).
Moreover, there are requirements for the driver to handle cases like
CPU_BIG_ENDIAN=y on ARM-based SoCs.
This requires for a complete rewrite of the I/O accessors.
PPC-specific accessors - {in,out}_{le,be}XX - are replaced with
generic ones - io{read,write}[be]XX.
Endianness is detected dynamically (at runtime) to allow for
multiplatform kernels, for e.g. running the same kernel image
on LS1043A (BE CAAM) and LS2080A (LE CAAM) armv8-based SoCs.
While here: debugfs entries need to take into consideration the
endianness of the core when displaying data. Add the necessary
glue code so the entries remain the same, but they are properly
read, regardless of the core and/or SEC endianness.
Note: pdb.h fixes only what is currently being used (IPsec).
Reviewed-by: Tudor Ambarus <tudor-dan.ambarus@nxp.com> Signed-off-by: Horia Geantă <horia.geanta@nxp.com> Signed-off-by: Alex Porosanu <alexandru.porosanu@nxp.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Horia Geantă [Thu, 19 May 2016 15:10:43 +0000 (18:10 +0300)]
asm-generic/io.h: allow barriers in io{read,write}{16,32}be
While reviewing the addition of io{read,write}64be accessors, Arnd
-finds a potential problem:
"If an architecture overrides readq/writeq to have barriers but does
not override ioread64be/iowrite64be, this will lack the barriers and
behave differently from the little-endian version. I think the only
affected architecture is ARC, since ARM and ARM64 both override the
big-endian accessors to have the correct barriers, and all others
don't use barriers at all."
-suggests a fix for the same problem in existing code (16/32-bit
accessors); the fix leads "to a double-swap on architectures that
don't override the io{read,write}{16,32}be accessors, but it will
work correctly on all architectures without them having to override
these accessors."
Stephan Mueller [Mon, 16 May 2016 00:53:36 +0000 (02:53 +0200)]
crypto: user - no parsing of CRYPTO_MSG_GETALG
The CRYPTO_MSG_GETALG netlink message type provides a buffer to the
kernel to retrieve information from the kernel. The data buffer will not
provide any input and will not be read. Hence the nlmsg_parse is not
applicable to this netlink message type.
This patch fixes the following kernel log message when using this
netlink interface:
netlink: 208 bytes leftover after parsing attributes in process `XXX'.
Patch successfully tested with libkcapi from [1] which uses
CRYPTO_MSG_GETALG to obtain cipher-specific information from the kernel.
[1] http://www.chronox.de/libkcapi.html
Signed-off-by: Stephan Mueller <smueller@chronox.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Linus Torvalds [Sun, 29 May 2016 20:28:39 +0000 (13:28 -0700)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"This is a set of four fixes noticed in the merge window. The aacraid
one is an optimisation, the mp3sas one fixes a spurious printk, the
sd_check_events one fixes a theoretical race and the failed zero
length commands fixes a bug in our completion/retry routines that has
been causing problems in the field"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
aacraid: do not activate events on non-SRC adapters
mpt3sas: add missing curly braces
sd: get disk reference in sd_check_events()
scsi_lib: correctly retry failed zero length REQ_TYPE_FS commands
George Spelvin [Sun, 29 May 2016 05:26:41 +0000 (01:26 -0400)]
Rename other copy of hash_string to hashlen_string
The original name was simply hash_string(), but that conflicted with a
function with that name in drivers/base/power/trace.c, and I decided
that calling it "hashlen_" was better anyway.
But you have to do it in two places.
[ This caused build errors for architectures that don't define
CONFIG_DCACHE_WORD_ACCESS - Linus ]
Mikulas Patocka [Tue, 24 May 2016 20:49:18 +0000 (22:49 +0200)]
hpfs: implement the show_options method
The HPFS filesystem used generic_show_options to produce string that is
displayed in /proc/mounts. However, there is a problem that the options
may disappear after remount. If we mount the filesystem with option1
and then remount it with option2, /proc/mounts should show both option1
and option2, however it only shows option2 because the whole option
string is replaced with replace_mount_options in hpfs_remount_fs.
To fix this bug, implement the hpfs_show_options function that prints
options that are currently selected.
Mikulas Patocka [Tue, 24 May 2016 20:48:33 +0000 (22:48 +0200)]
affs: fix remount failure when there are no options changed
Commit c8f33d0bec99 ("affs: kstrdup() memory handling") checks if the
kstrdup function returns NULL due to out-of-memory condition.
However, if we are remounting a filesystem with no change to
filesystem-specific options, the parameter data is NULL. In this case,
kstrdup returns NULL (because it was passed NULL parameter), although no
out of memory condition exists. The mount syscall then fails with
ENOMEM.
This patch fixes the bug. We fail with ENOMEM only if data is non-NULL.
The patch also changes the call to replace_mount_options - if we didn't
pass any filesystem-specific options, we don't call
replace_mount_options (thus we don't erase existing reported options).
Mikulas Patocka [Tue, 24 May 2016 20:47:00 +0000 (22:47 +0200)]
hpfs: fix remount failure when there are no options changed
Commit ce657611baf9 ("hpfs: kstrdup() out of memory handling") checks if
the kstrdup function returns NULL due to out-of-memory condition.
However, if we are remounting a filesystem with no change to
filesystem-specific options, the parameter data is NULL. In this case,
kstrdup returns NULL (because it was passed NULL parameter), although no
out of memory condition exists. The mount syscall then fails with
ENOMEM.
This patch fixes the bug. We fail with ENOMEM only if data is non-NULL.
The patch also changes the call to replace_mount_options - if we didn't
pass any filesystem-specific options, we don't call
replace_mount_options (thus we don't erase existing reported options).
Fixes: ce657611baf9 ("hpfs: kstrdup() out of memory handling") Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
VDSO:
- Build microMIPS VDSO for microMIPS kernels.
- Fix aliasing warning by building with `-fno-strict-aliasing' for
debugging but also tracing them might result in recursion.
Misc:
- Add missing FROZEN hotplug notifier transitions.
- Fix clk binding example for varioius PIC32 devices.
- Fix cpu interrupt controller node-names in the DT files.
- Fix XPA CPU feature separation.
- Fix write_gc0_* macros when writing zero.
- Add inline asm encoding helpers.
- Add missing VZ accessor microMIPS encodings.
- Fix little endian microMIPS MSA encodings.
- Add 64-bit HTW fields and fix its configuration.
- Fix sigreturn via VDSO on microMIPS kernel.
- Lots of typo fixes.
- Add definitions of SegCtl registers and use them"
Linus Torvalds [Sat, 28 May 2016 23:15:25 +0000 (16:15 -0700)]
Merge branch 'hash' of git://ftp.sciencehorizons.net/linux
Pull string hash improvements from George Spelvin:
"This series does several related things:
- Makes the dcache hash (fs/namei.c) useful for general kernel use.
(Thanks to Bruce for noticing the zero-length corner case)
- Converts the string hashes in <linux/sunrpc/svcauth.h> to use the
above.
- Avoids 64-bit multiplies in hash_64() on 32-bit platforms. Two
32-bit multiplies will do well enough.
- Rids the world of the bad hash multipliers in hash_32.
This finishes the job started in commit 689de1d6ca95 ("Minimal
fix-up of bad hashing behavior of hash_64()")
The vast majority of Linux architectures have hardware support for
32x32-bit multiply and so derive no benefit from "simplified"
multipliers.
The few processors that do not (68000, h8/300 and some models of
Microblaze) have arch-specific implementations added. Those
patches are last in the series.
- Overhauls the dcache hash mixing.
The patch in commit 0fed3ac866ea ("namei: Improve hash mixing if
CONFIG_DCACHE_WORD_ACCESS") was an off-the-cuff suggestion.
Replaced with a much more careful design that's simultaneously
faster and better. (My own invention, as there was noting suitable
in the literature I could find. Comments welcome!)
- Modify the hash_name() loop to skip the initial HASH_MIX(). This
would let us salt the hash if we ever wanted to.
- Sort out partial_name_hash().
The hash function is declared as using a long state, even though
it's truncated to 32 bits at the end and the extra internal state
contributes nothing to the result. And some callers do odd things:
- fs/hfs/string.c only allocates 32 bits of state
- fs/hfsplus/unicode.c uses it to hash 16-bit unicode symbols not bytes
- Modify bytemask_from_count to handle inputs of 1..sizeof(long)
rather than 0..sizeof(long)-1. This would simplify users other
than full_name_hash"
Special thanks to Bruce Fields for testing and finding bugs in v1. (I
learned some humbling lessons about "obviously correct" code.)
On the arch-specific front, the m68k assembly has been tested in a
standalone test harness, I've been in contact with the Microblaze
maintainers who mostly don't care, as the hardware multiplier is never
omitted in real-world applications, and I haven't heard anything from
the H8/300 world"
* 'hash' of git://ftp.sciencehorizons.net/linux:
h8300: Add <asm/hash.h>
microblaze: Add <asm/hash.h>
m68k: Add <asm/hash.h>
<linux/hash.h>: Add support for architecture-specific functions
fs/namei.c: Improve dcache hash function
Eliminate bad hash multipliers from hash_32() and hash_64()
Change hash_64() return value to 32 bits
<linux/sunrpc/svcauth.h>: Define hash_str() in terms of hashlen_string()
fs/namei.c: Add hashlen_string() function
Pull out string hash to <linux/stringhash.h>
George Spelvin [Wed, 25 May 2016 18:19:49 +0000 (14:19 -0400)]
h8300: Add <asm/hash.h>
This will improve the performance of hash_32() and hash_64(), but due
to complete lack of multi-bit shift instructions on H8, performance will
still be bad in surrounding code.
Designing H8-specific hash algorithms to work around that is a separate
project. (But if the maintainers would like to get in touch...)
Signed-off-by: George Spelvin <linux@sciencehorizons.net> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: uclinux-h8-devel@lists.sourceforge.jp
George Spelvin [Wed, 25 May 2016 15:06:09 +0000 (11:06 -0400)]
microblaze: Add <asm/hash.h>
Microblaze is an FPGA soft core that can be configured various ways.
If it is configured without a multiplier, the standard __hash_32()
will require a call to __mulsi3, which is a slow software loop.
Instead, use a shift-and-add sequence for the constant multiply.
GCC knows how to do this, but it's not as clever as some.
Signed-off-by: George Spelvin <linux@sciencehorizons.net> Cc: Alistair Francis <alistair.francis@xilinx.com> Cc: Michal Simek <michal.simek@xilinx.com>
George Spelvin [Fri, 27 May 2016 02:11:51 +0000 (22:11 -0400)]
<linux/hash.h>: Add support for architecture-specific functions
This is just the infrastructure; there are no users yet.
This is modelled on CONFIG_ARCH_RANDOM; a CONFIG_ symbol declares
the existence of <asm/hash.h>.
That file may define its own versions of various functions, and define
HAVE_* symbols (no CONFIG_ prefix!) to suppress the generic ones.
Included is a self-test (in lib/test_hash.c) that verifies the basics.
It is NOT in general required that the arch-specific functions compute
the same thing as the generic, but if a HAVE_* symbol is defined with
the value 1, then equality is tested.
Signed-off-by: George Spelvin <linux@sciencehorizons.net> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greg Ungerer <gerg@linux-m68k.org> Cc: Andreas Schwab <schwab@linux-m68k.org> Cc: Philippe De Muyter <phdm@macq.eu> Cc: linux-m68k@lists.linux-m68k.org Cc: Alistair Francis <alistai@xilinx.com> Cc: Michal Simek <michal.simek@xilinx.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: uclinux-h8-devel@lists.sourceforge.jp
George Spelvin [Mon, 23 May 2016 11:43:58 +0000 (07:43 -0400)]
fs/namei.c: Improve dcache hash function
Patch 0fed3ac866 improved the hash mixing, but the function is slower
than necessary; there's a 7-instruction dependency chain (10 on x86)
each loop iteration.
Word-at-a-time access is a very tight loop (which is good, because
link_path_walk() is one of the hottest code paths in the entire kernel),
and the hash mixing function must not have a longer latency to avoid
slowing it down.
There do not appear to be any published fast hash functions that:
1) Operate on the input a word at a time, and
2) Don't need to know the length of the input beforehand, and
3) Have a single iterated mixing function, not needing conditional
branches or unrolling to distinguish different loop iterations.
One of the algorithms which comes closest is Yann Collet's xxHash, but
that's two dependent multiplies per word, which is too much.
The key insights in this design are:
1) Barring expensive ops like multiplies, to diffuse one input bit
across 64 bits of hash state takes at least log2(64) = 6 sequentially
dependent instructions. That is more cycles than we'd like.
2) An operation like "hash ^= hash << 13" requires a second temporary
register anyway, and on a 2-operand machine like x86, it's three
instructions.
3) A better use of a second register is to hold a two-word hash state.
With careful design, no temporaries are needed at all, so it doesn't
increase register pressure. And this gets rid of register copying
on 2-operand machines, so the code is smaller and faster.
4) Using two words of state weakens the requirement for one-round mixing;
we now have two rounds of mixing before cancellation is possible.
5) A two-word hash state also allows operations on both halves to be
done in parallel, so on a superscalar processor we get more mixing
in fewer cycles.
I ended up using a mixing function inspired by the ChaCha and Speck
round functions. It is 6 simple instructions and 3 cycles per iteration
(assuming multiply by 9 can be done by an "lea" instruction):
x ^= *input++;
y ^= x; x = ROL(x, K1);
x += y; y = ROL(y, K2);
y *= 9;
Not only is this reversible, two consecutive rounds are reversible:
if you are given the initial and final states, but not the intermediate
state, it is possible to compute both input words. This means that at
least 3 words of input are required to create a collision.
(It also has the property, used by hash_name() to avoid a branch, that
it hashes all-zero to all-zero.)
The rotate constants K1 and K2 were found by experiment. The search took
a sample of random initial states (I used 1023) and considered the effect
of flipping each of the 64 input bits on each of the 128 output bits two
rounds later. Each of the 8192 pairs can be considered a biased coin, and
adding up the Shannon entropy of all of them produces a score.
The best-scoring shifts also did well in other tests (flipping bits in y,
trying 3 or 4 rounds of mixing, flipping all 64*63/2 pairs of input bits),
so the choice was made with the additional constraint that the sum of the
shifts is odd and not too close to the word size.
The final state is then folded into a 32-bit hash value by a less carefully
optimized multiply-based scheme. This also has to be fast, as pathname
components tend to be short (the most common case is one iteration!), but
there's some room for latency, as there is a fair bit of intervening logic
before the hash value is used for anything.
(Performance verified with "bonnie++ -s 0 -n 1536:-2" on tmpfs. I need
a better benchmark; the numbers seem to show a slight dip in performance
between 4.6.0 and this patch, but they're too noisy to quote.)
Special thanks to Bruce fields for diligent testing which uncovered a
nasty fencepost error in an earlier version of this patch.
[checkpatch.pl formatting complaints noted and respectfully disagreed with.]
Signed-off-by: George Spelvin <linux@sciencehorizons.net> Tested-by: J. Bruce Fields <bfields@redhat.com>