Add an XChaCha20 implementation that is hooked up to the x86_64 SIMD
implementations of ChaCha20. This can be used by Adiantum.
An SSSE3 implementation of single-block HChaCha20 is also added so that
XChaCha20 can use it rather than the generic implementation. This
required refactoring the ChaCha permutation into its own function.
Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>